I need a subroutine that, given a set of characters, will generate all possible combinations of those characters of length k. Order matters and reuse is allowed, so if k = 2
then AB != BA
and AA
is an option. I found some working examples on PerlMonks, but unfortunately they are code golf and not easy for me to wrap my mind around. Can someone please do one or more of the following?
- Give a breakdown and explanation of how the first algorithm works.
- De-obfuscate the code so that the meaning is clearer.
- Point me toward another example that is clearer.
Thanks!
You can use variations_with_repetition from Algorithm::Combinatorics (which also provides an iterator-based interface), but if you just need a list, this is a fairly simple recursive algorithm:
sub ordered_combinations
{
my ($data, $k) = @_;
return @$data if $k == 1;
my @previous = ordered_combinations($data, $k-1);
my @results;
for my $letter (@$data) {
push @results, map { $letter . $_ } @previous;
}
return @results;
} # end ordered_combinations
print "$_\n" for ordered_combinations([qw(a b c)], 3);
This is basically the same algorithm the code golfers are using, but I'm using a for
loop instead of nesting map
. Also, I recurse only once per level (code golf is about minimizing source code, not runtime).
Any recursive function can be converted to an iterative one, which usually reduces its overhead. This one is fairly simple:
sub ordered_combinations
{
my ($data, $k) = @_;
return if $k < 1;
my $results = $data;
while (--$k) {
my @new;
for my $letter (@$data) {
push @new, map { $letter . $_ } @$results;
} # end for $letter in @$data
$results = \@new;
} # end while --$k is not 0
return @$results;
} # end ordered_combinations
This version handles the $k == 0
case, which the original did not.
I had a look at the very first piece of code on the page you referred to:
sub c{my$n=-1+shift;$n?map{my$c=$_;map$c.$_,c($n,@_)}@_:@_}
I have spread it out a bit to make it more readable; also I have made some changes to it to make it clearer (see combinations
):
#!/usr/bin/perl
use strict;
use warnings;
sub c {
my $n=-1+shift;
$n ? map{
my $c = $_;
map $c . $_ , c($n ,@_)
} @_
: @_;
}
sub combinations {
my $number = shift; # remove the first item from @_
my @chars = @_; # the remainder of @_
$number --; # decrement $number, so that you will eventually exit
# from this recursive subroutine (once $number == 0)
if ($number) { # true as long as $number != 0 and $number not undef
my @result;
foreach my $char (@chars) {
my @intermediate_list = map { $char . $_ } combinations($number, @chars);
push @result, @intermediate_list;
}
return @result; # the current concatenation result will be used for creation of
# @intermediate_list in the 'subroutine instance' that called 'combinations'
}
else {
return @chars;
}
}
print join " ", combinations(2, "A", "B");
print "\n";
print join " ", c(2, "A", "B");
print "\n\n";
print join " ", combinations(3, "A", "B");
print "\n";
print join " ", c(3, "A", "B");
print "\n";
Both versions work in the same way, and they produce exactly the same output:
AA AB BA BB
AA AB BA BB
AAA AAB ABA ABB BAA BAB BBA BBB
AAA AAB ABA ABB BAA BAB BBA BBB
I included some comments in the code, but perhaps a lengthier explanation is in order!? Well, here's an example to illustrate how things work: let's say we've got two items, "A" and "B", and we want to get all possible combinations of 2 of these items. In that case, $number
will initially be equal to 2 (as we want to get pairs), and @chars
will be equal to ('A', 'B')
.
The first time combinations
is called, $number
is decremented to 1, thus the if
condition is met, and we enter the foreach
loop. This first sets $char
to 'A'. It then calls combinations(1, ('A', 'B'))
. As $number
always gets decremented when the subroutine is called, $number
is 0 in this 'child subroutine', consequently the child simply returns ('A', 'B'). Thus:
@intermediate_list = map { $char . $_ } ('A', 'B'); # $char eq 'A'
map
then takes both 'A' and 'B' and concatenates each with 'A' ($char), thus @intermediate_list
is ('AA', 'AB'). In the next round of the foreach
loop, the same is done with $char = B
, which sets @intermediate_list
to ('BA', 'BB').
In each round the contents of @intermediate_list
are pushed into the result list, hence @result
eventually contains all possible combinations.
If you want to get triplets instead of pairs, you will obviously start with $number = 3
, and combinations
will be called three times. The second time it's called it will return @result
, i.e. a pair-containing list. Each item from that list will be concatenated with each character of the initial character set.
Okay, I hope this makes sense. Please ask in case something has not become clear.
EDIT: Please see ysth's comment below.