Sometimes some interesting problem comes by, last week was this Distribute strings evenly between a defined number of buckets. We end up using a hashing and modulus operation.

Hashing would take of the distribution and modulus would constrain the choices to the number we were expecting. I took the chance to remove some rust of my Perl 5 knowledge and also see how it look like in Perl 6 (yes, you read it right Perl 6, here's why I insist).

### Perl 5 solution

``````#!/usr/bin/perl
use strict;
use warnings;
use Digest::MD5 qw/ md5 / ;

my %histogram;

foreach my \$serial (map { "ABC-\$_" } ('000'..'999')) {
\$histogram{unpack('L*', md5(\$serial)) % 30} .= "#";
}

foreach my \$key (sort {\$a <=> \$b} (keys %histogram)) {
print "\$key\t", \$histogram{\$key}, "\n";
}

my(\$min, \$max) = map { length(\$histogram{\$_}) }
((sort { length(\$histogram{\$a}) <=> length(\$histogram{\$b}) }
keys %histogram)[0,\$#_]);

print "maximum: ",\$max, "\n";
print "minimum: ",\$min, "\n";
``````

Let's explain what is happening here, you see that this script is using the MD5 (`line 4`) algorithm I had to use this algorithm so that I could use the same on Perl 6 which unfortunately isn't fully equipped with encryption algorithms yet.

The first `foreach` does the hard work, it first we use the range operation `'000'..'999'` which generates a sequence like `000, 001, 002, ...` adding a `map` to it just to produce something like `ABC-000, ABC-001, ABC-002, ...`.

On `line 9` lot of things are happening. First we calculate the MD5 of the sequence above. The `md5()` actually gives the binary result of the calculation, that's why we `unpack` it into a long. Than we have the `%` modulus operator, which will restrict our numbers from `0` to `29`.

There's a little trick here. We're using a associative array as data structure (known as hash in Perl). Each bucket (the ones between 0 and 29) will be a key and each time we repeat we concatenate the string with a `#`. Just so that we can print on the console a sort of graphical representation to check our hypothesis, a histogram of the distribution.

The second `foreach` does the printing, since hashes are not ordered we sort the keys before printing with `sort {\$a <=> \$b} (keys %histogram)`. After that, comes figuring out the maximum and minimum occupancy of the buckets, and frankly this line is not pretty.

To understand, first focus on the `sort` until the word `keys`. It is very similar with the one above with the difference that we call `length(\$histogram{\$a})`, which really means that we are sorting by the length of the values (remember, the values are strings of `#`s).

Since my Perl is rusted it took me sometime to code the rest of the expression. Look at the end and you'll see `[0,\$#_]`. The function `sort` returns an array, but I'm interested only in two elements, the first and the last element of this `sort` which will give me the maximum (the largest string) and the minimum (the smallest string). The `[0,\$#_]` is called a slice of an array, the `\$#_` means the last element, therefore of the output of that `sort` I get the first and the last element by slicing the resulting array.

But what `sort` is returning is the `keys` of the histogram sorted by value and I want the value. That's why you see a `map`, the `map` is processing the slice (which has only two elements) and calling `length` on each one which finally give us the maximum and the minimum occupancy of buckets.

### Perl 6 solution

``````#!/usr/bin/env perl6
use v6;
use Digest::MD5;

my \$digest = Digest::MD5.new;
my %histogram;

for ('000'..'999').map(-> \$s {"ABC-\$s"}) -> \$serial {
%histogram{:16(\$digest.md5_hex(\$serial)) % 30}~= '#';
}

for %histogram.keys.sort: -> \$a, \$b {\$a <=> \$b} -> \$key {
say "\$key\t%histogram{\$key}";
}

say "maximum: ", [max] %histogram.values».chars;
say "minimum: ", [min] %histogram.values».chars;

``````

On Perl 6 things are pretty similar, the first `for` (which does the `foreach`) does the same as in the Perl 5 solution. The only difference is that the `md5_hex()` function returns a string instead of a binary representation of the MD5, to convert it to a decimal representation we use `:16` which is part of the radix operators in Perl 6. Again, we use an associative array and concatenate `#` on each bucket.

The second `for` does the same thing again, it prints out the histogram by the order of its keys. Note that in Perl 6 everything is an object like Ruby, that's why the syntax differs, on Perl 5 we call functions most of the time.

The really different part, which is the reason of writing this post altogether, is the calculation of maximum and minimum occupancy. Here I had the chance to use some neat additions in Perl 6.

Fist the `[max]` and `[min]` are meta-operators. What they do in this case is a reduction, the `[max]` will return the maximum of a list and the `[min]` likewise the minimum. If you remember I'm looking for the max and min of the length of the values of this histogram. And this is where we can apply the `»` (a hyper operator). What is doing is calling `.chars` in each of the values, this in turn is fed to `[max]` which returns the maximum of these. So, that long line in the Perl 5 could be compressed in one line of Perl 6.

### Output

Both scripts will give an output similar to this one. Note, Perl 6 solution works on rakudo-star 2013.05:

``````0	#############################################
1	#################################
2	####################################
3	############################
4	###########################
5	##########################
6	##############################
7	###############################
8	###################################
9	###############################
10	################################
11	########################################
12	#######################################
13	######################
14	#####################################
15	########################################
16	########################
17	###############################################
18	##################################
19	####################################
20	##########################################
21	#######################
22	#################################
23	###########################################
24	#########################
25	############################
26	######################################
27	########################################
28	################################
29	#######################
maximum: 47
minimum: 22
``````

That's it.