Trying to extract the subsets with length k using filter. Not sure how to approach it? The list has 100 elements.
subsets :: [a] -> [[a]]
subsets [] = [[]]
subsets (x:xs) = [zs | ys <- subsets xs, zs <- [ys, (x:ys)]]
If i use filter this is what i thought it would be:
filter (length(3)) subsets [1,2,3,4,5]
But i'm probably wrong. If there is a different approach rather than filter? I'm new to haskell so not exactly sure.
When I get stuck with a little confusion in filtering, I go a level up and use
foldr
in this case would be as simple as:With
filter
should be:After thinking a lot, and with the help of chi, and asking this question I was able to solve it:
some examples:
And now you are able to make your monster a little puppet:
Here's a general solution for length-n subsets not using filter.
Where our initial list is
x:xs
, notice that we can partition these subsets into those containingx
and those not containingx
. This shows us a nice recursive structure; the first partition isx
prepended to each length-(n-1) subset ofxs
, and the second is just the length-n subsets ofxs
.All we need are the base cases. There is a single length-0 subset, and no subset is larger than the original:
Stick these bases above the recursive step and throw an appropriate type signature on it, and we're done.
Nice.
Be careful.
(++)
is slow; if you know at compile-time the length you'll be using, Damián Rafael Lattenero'stails
approach may be more performant. Not entirely sure about this, though. Also, depending on the values, you might do well to swap the operands of(++)
. I haven't yet done the math.The number of subsets for a list of 100 elements is about 2100 ≃ 1.26*1030, a really huge number. So the
filter
approach does not seem practical. The problem should be solved by manipulating lists containing just a few numbers between 1 and 100.So we aim to write a function to be named
kSubsets
which returns the list of all subsets of cardinality k:where k is the first argument.
A solution based on recursive list processing:
A possible way to build the functionality of
kSubsets
consists in using an auxiliarykIndexSubsets
function which computes the zero-based indexes of the elements, instead of the elements themselves. ThekIndexSubsets
function can be written in a recursive fashion.In that case, the
kSubsets
function is essentially a wrapper which maps the element indexes to the actual list elements. This gives the following code:We can now test our
kSubsets
function. This involves checking that the length of the resulting output list conforms to the classic combinatorics formula, that is n!/(k! * (n-k)!) where n is the length of the input list.The evaluation of
kSubsets 3 [ 1 .. 100 ]
takes less than 50 msec on a plain vanilla x86-64 Linux machine.An alternative solution based on a state machine:
The (reversed) list of chosen indexes is taken to be the state of an automaton, and we advance the state step by step, until this is no longer possible, at which point the list of sublists is complete.
Basically, if there is room to advance the rightmost index, fine, otherwise we recurse to advance the rest of the list, and then move the rightmost index as far left as possible.
The approach gives this alternative source code for
kIndexSubsets
, in which the key piece is theksAdvance
stepping function:This algorithm seems less memory-hungry and faster than the first one.
Using this main program for a quick performance test, with subsets of 5 elements out of 100, generating 75287520 subsets:
Memory performance is improved:
Not yet as good as Fortran but getting close :-)