TL;DR: I want the exact behavior as filter ((== 4) . length) . subsequences
. Just using subsequences
also creates variable length of lists, which takes a lot of time to process. Since in the end only lists of length 4 are needed, I was thinking there must be a faster way.
I have a list of functions. The list has the type [Wor -> Wor]
The list looks something like this
[f1, f2, f3 .. fn]
What I want is a list of lists of n
functions while preserving order like this
input : [f1, f2, f3 .. fn]
argument : 4 functions
output : A list of lists of 4 functions.
Expected output would be where if there's an f1
in the sublist, it'll always be at the head
of the list.
If there's a f2
in the sublist and if the sublist doens't have f1
, f2
would be at head
. If fn
is in the sublist, it'll be at last
.
In general if there's a fx
in the list, it never will be infront of f(x - 1)
.
Basically preserving the main list's order when generating sublists.
It can be assumed that length of list will always be greater then given argument.
I'm just starting to learn Haskell so I haven't tried all that much but so far this is what I have tried is this:
Generation permutations with subsequences
function and applying (filter (== 4) . length)
on it seems to generate correct permutations -but it doesn't preserve order- (It preserves order, I was confusing it with my own function).
So what should I do?
Also if possible, is there a function or a combination of functions present in Hackage
or Stackage
which can do this? Because I would like to understand the source.
You describe a nondeterministic take
:
ndtake :: Int -> [a] -> [[a]]
ndtake 0 _ = [[]]
ndtake n [] = []
ndtake n (x:xs) = map (x:) (ndtake (n-1) xs) ++ ndtake n xs
Either we take an x
, and have n-1
more to take from xs
; or we don't take the x
and have n
more elements to take from xs
.
Running:
> ndtake 3 [1..4]
[[1,2,3],[1,2,4],[1,3,4],[2,3,4]]
Update: you wanted efficiency. If we're sure the input list is finite, we can aim at stopping as soon as possible:
ndetake n xs = go (length xs) n xs
where
go spare n _ | n > spare = []
go spare n xs | n == spare = [xs]
go spare 0 _ = [[]]
go spare n [] = []
go spare n (x:xs) = map (x:) (go (spare-1) (n-1) xs)
++ go (spare-1) n xs
Trying it:
> length $ ndetake 443 [1..444]
444
The former version seems to be stuck on this input, but the latter one returns immediately.
But, it measures the length of the whole list, and needlessly so, as pointed out by @dfeuer in the comments. We can achieve the same improvement in efficiency while retaining a bit more laziness:
ndzetake :: Int -> [a] -> [[a]]
ndzetake n xs | n > 0 =
go n (length (take n xs) == n) (drop n xs) xs
where
go n b p ~(x:xs)
| n == 0 = [[]]
| not b = []
| null p = [(x:xs)]
| otherwise = map (x:) (go (n-1) b p xs)
++ go n b (tail p) xs
Now the last test also works instantly with this code as well.
There's still room for improvement here. Just as with the library function subsequences
, the search space could be explored even more lazily. Right now we have
> take 9 $ ndzetake 3 [1..]
[[1,2,3],[1,2,4],[1,2,5],[1,2,6],[1,2,7],[1,2,8],[1,2,9],[1,2,10],[1,2,11]]
but it could be finding [2,3,4]
before forcing the 5
out of the input list. Shall we leave it as an exercise?
Here's the best I've been able to come up with. It answers the challenge Will Ness laid down to be as lazy as possible in the input. In particular, ndtake m ([1..n]++undefined)
will produce as many entries as possible before throwing an exception. Furthermore, it strives to maximize sharing among the result lists (note the treatment of end
in ndtakeEnding'
). It avoids problems with badly balanced list appends using a difference list. This sequence-based version is considerably faster than any pure-list version I've come up with, but I haven't teased apart just why that is. I have the feeling it may be possible to do even better with a better understanding of just what's going on, but this seems to work pretty well.
Here's the general idea. Suppose we ask for ndtake 3 [1..5]
. We first produce all the results ending in 3
(of which there is one). Then we produce all the results ending in 4
. We do this by (essentially) calling ndtake 2 [1..3]
and adding the 4
onto each result. We continue in this manner until we have no more elements.
import qualified Data.Sequence as S
import Data.Sequence (Seq, (|>))
import Data.Foldable (toList)
We will use the following simple utility function. It's almost the same as splitAtExactMay
from the 'safe' package, but hopefully a bit easier to understand. For reasons I haven't investigated, letting this produce a result when its argument is negative leads to ndtake
with a negative argument being equivalent to subsequences
. If you want, you can easily change ndtake
to do something else for negative arguments.
-- to return an empty list in the negative case.
splitAtMay :: Int -> [a] -> Maybe ([a], [a])
splitAtMay n xs
| n <= 0 = Just ([], xs)
splitAtMay _ [] = Nothing
splitAtMay n (x : xs) = flip fmap (splitAtMay (n - 1) xs) $
\(front, rear) -> (x : front, rear)
Now we really get started. ndtake
is implemented using ndtakeEnding
, which produces a sort of "difference list", allowing all the partial results to be concatenated cheaply.
ndtake :: Int -> [t] -> [[t]]
ndtake n xs = ndtakeEnding n xs []
ndtakeEnding :: Int -> [t] -> ([[t]] -> [[t]])
ndtakeEnding 0 _xs = ([]:)
ndtakeEnding n xs = case splitAtMay n xs of
Nothing -> id -- Not enough elements
Just (front, rear) ->
(front :) . go rear (S.fromList front)
where
-- For each element, produce a list of all combinations
-- *ending* with that element.
go [] _front = id
go (r : rs) front =
ndtakeEnding' [r] (n - 1) front
. go rs (front |> r)
ndtakeEnding
doesn't call itself recursively. Rather, it calls ndtakeEnding'
to calculate the combinations of the front part. ndtakeEnding'
is very much like ndtakeEnding
, but with a few differences:
- We use a
Seq
rather than a list to represent the input sequence. This lets us split and snoc cheaply, but I'm not yet sure why that seems to give amortized performance that is so much better in this case.
- We already know that the input sequence is long enough, so we don't need to check.
- We're passed a tail (
end
) to add to each result. This lets us share tails when possible. There are lots of opportunities for sharing tails, so this can be expected to be a substantial optimization.
- We use
foldr
rather than pattern matching. Doing this manually with pattern matching gives clearer code, but worse constant factors. That's because the :<|
, and :|>
patterns exported from Data.Sequence
are non-trivial pattern synonyms that perform a bit of calculation, including amortized O(1) allocation, to build the tail or initial segment, whereas folds don't need to build those.
NB: this implementation of ndtakeEnding'
works well for recent GHC and containers; it seems less efficient for earlier versions. That might be the work of Donnacha Kidney on foldr
for Data.Sequence
. In earlier versions, it might be more efficient to pattern match by hand, using viewl
for versions that don't offer the pattern synonyms.
ndtakeEnding' :: [t] -> Int -> Seq t -> ([[t]] -> [[t]])
ndtakeEnding' end 0 _xs = (end:)
ndtakeEnding' end n xs = case S.splitAt n xs of
(front, rear) ->
((toList front ++ end) :) . go rear front
where
go = foldr go' (const id) where
go' r k !front = ndtakeEnding' (r : end) (n - 1) front . k (front |> r)
-- With patterns, a bit less efficiently:
-- go Empty _front = id
-- go (r :<| rs) !front =
-- ndtakeEnding' (r : end) (n - 1) front
-- . go rs (front :|> r)