Data.Tree
includes unfoldTreeM_BF
and unfoldForestM_BF
functions to construct trees breadth-first using the results of monadic actions. The tree unfolder can be written easily using the forest unfolder, so I'll focus on the latter:
unfoldForestM_BF :: Monad m =>
(b -> m (a, [b])) -> [b] -> m [Tree a]
Starting with a list of seeds, it applies a function to each, generating actions that will produce the tree roots and the seeds for the next level of unfolding. The algorithm used is somewhat strict, so using unfoldForestM_BF
with the Identity
monad is not exactly the same as using the pure unfoldForest
. I've been trying to figure out if there's a way to make it lazy without sacrificing its O(n)
time bound. If (as Edward Kmett suggested to me) this is impossible, I wonder if it would be possible to do it with a more constrained type, specifically requiring MonadFix
rather than Monad
. The concept there would be to (somehow) set up the pointers to the results of future computations while adding those computations to the to-do list, so if they are lazy in the effects of earlier computations they will be available immediately.
I previously claimed that the third solution presented below has the same strictness as the depth-first
unfoldForest
, which is not correct.Your intuition that trees can be lazily unfolded breadth first is at least partially correct, even if we don't require a
MonadFix
instance. Solutions exist for the special cases when the branching factor is known to be finite and when the branching factor is known to be "large". We will start with a solution that runs inO(n)
time for trees with finite branching factors including degenerate trees with only one child per node. The solution for finite branching factors will fail to terminate on trees with infinite branching factors, which we will rectify with a solution that that runs inO(n)
time for trees with "large" branching factors greater than one including trees with infinite branching factor. The solution for "large" branching factors will run inO(n^2)
time on degenerate trees with only one child or no children per node. When we combine the methods from both steps in an attempt to make a hybrid solution that runs inO(n)
time for any branching factor we will get a solution that is lazier than the first solution for finite branching factors but cannot accommodate trees that make a rapid transition from an infinite branching factor to having no branches.Finite Branching Factor
The general idea is that we will first build all the labels for an entire level and the seeds for the forests for the next level. We will then descend into the next level, building all of it. We will collect together the results from the deeper level to build the forests for the outer level. We will put the labels together with the forests to build the trees.
unfoldForestM_BF
is fairly simple. If there are no seeds for the level it returns. After building all the labels, it takes the seeds for each forest and collects them together into one list of all the seeds to build the next level and unfolds the entire deeper level. Finally it constructs the forest for each tree from the structure of the seeds.trace
reconstructs the structure of nested lists from a flattened list.It is assumed that there is an item in[b]
for each of the items anywhere in[[a]]
. The use ofconcat
...trace
to flatten all the information about ancestor levels prevents this implementation from working on trees with infinite children for a node.Unfolding a tree is trivial to write in terms of unfolding a forest.
Large Branching Factor
The solution for large branching factor proceeds in much the same way as the solution for finite branching factor, except it keeps the entire structure of the tree around instead of
concat
enating the branches in a level to a single list andtrace
ing that list. In addition to theimport
s used in the previous section, we will be usingCompose
to compose the functors for multiple levels of a tree together andTraversable
tosequence
across multi-level structures.Instead of flattening all of the ancestor structures together with
concat
we will wrap withCompose
the ancestors and the seeds for the next level and recurse on the entire structure.zipWithIrrefutable
is a lazier version ofzipWith
that relies on the assumption that there is an item in the second list for each item in the first list. TheTraceable
structures are theFunctors
that can provide azipWithIrrefutable
. The laws forTraceable
are for everya
,xs
, andys
iffmap (const a) xs == fmap (const a) ys
thenzipWithIrrefutable (\x _ -> x) xs ys == xs
andzipWithIrrefutable (\_ y -> y) xs ys == ys
. Its strictness is given for everyf
andxs
byzipWithIrrefutable f xs ⊥ == fmap (\x -> f x ⊥) xs
.We can combine two lists lazily if we already know they have the same structure.
We can combine the composition of two functors if we know that we can combine each functor.
isEmpty
checks for an empty structure of nodes to expand like the pattern match on[]
did in the solution for finite branching factors.The astute reader may notice that
zipWithIrrefutable
fromTraceable
is very similar toliftA2
which is half of the definition ofApplicative
.Hybrid Solution
The hybrid solution combines the approaches of the finite solution and the "large" solution. Like the finite solution, we will compress and decompress the tree representation at each step. Like the solution for "large" branching factors we will use a data structure that allows stepping over complete branches. The finite branching factor solution used a data type that is flattened everywhere,
[b]
. The "large" branching factor solution used a data type that was flattened nowhere: increasingly nested lists starting with[b]
then[[b]]
then[[[b]]]
and so on. In between these structures would be nested lists that either stop nesting and just hold ab
or keep nesting and hold[b]
s. That pattern of recursion is described in general by theFree
monad.We will be working specifically with
Free []
which looks like.For the hybrid solution we will repeat all of its imports and components so that the code below here should be complete working code.
Since we will be working with
Free []
, we will provide it with azipWithIrrefutable
.The breadth first traversal will look very similar to the original version for the finitely branching tree. We build the current labels and seeds for the current level, compress the structure of the remainder of the tree, do all the work for the remaining depths, and decompress the structure of the results to get the forests to go with the labels.
compress
takes aFree []
holding the seeds for forests[b]
and flattens the[b]
into theFree
to get aFree [] b
. It also returns adecompress
function that can be used to undo the flattening to get the original structure back. We compress away branches with no remaining seeds and branches that only branch one way.Each compression step also returns a function that will undo it when applied to a
Free []
tree with the same structure. All of these functions are partially defined; what they do toFree []
trees with a different structure is undefined. For simplicity we also define partial functions for the inverses ofPure
andFree
.Both
unfoldForestM_BF
andunfoldTreeM_BF
are defined by packaging their argument up into aFree [] b
and unpackaging the results assuming they are in the same structure.A more elegant version of this algorithm can probably be made by recognizing that
>>=
for aMonad
is grafting on trees and bothFree
andFreeT
provide monad instances. Bothcompress
andcompressList
probably have more elegant presentations.The algorithm presented above is not lazy enough to allow querying trees that branch an infinite number of ways and then terminate. A simple counter example is the following generating function unfolded from
0
.This tree would look like
Attempting to descend the second branch (to
2
) and inspect the remaining finite sub-tree will fail to terminate.Examples
The following examples demonstrate that all implementations of
unfoldForestM_BF
run actions in breadth first order and thatrunIdentity . unfoldTreeM_BF (Identity . f)
has the same strictness asunfoldTree
for trees with finite branching factor. For trees with inifinite branching factor, only the solution for "large" branching factors has the same strictness asunfoldTree
. To demonstrate laziness we'll define three infinite trees - a unary tree with one branch, a binary tree with two branches, and an infinitary tree with an infinite number of branches for each node.Together with
unfoldTree
, we will defineunfoldTreeDF
in terms ofunfoldTreeM
to check thatunfoldTreeM
really is lazy like you claimed andunfoldTreeBF
in terms ofunfoldTreeMFix_BF
to check that the new implementation is just as lazy.To get finite pieces of these infinite trees, even the infinitely branching one, we'll define a way to take from a tree as long as its labels match a predicate. This could be written more succinctly in terms of the ability to apply a function to every
subForest
.This lets us define nine example trees.
All five methods have the same output for the unary and binary trees. The output comes from
putStrLn . drawTree . fmap show
However, the breadth first traversal from the finite branching factor solution is not sufficiently lazy for a tree with an infinite branching factor. The other four methods output the entire tree
The tree generated with
unfoldTreeBF
for the finite branching factor solution can never be completely drawn past its first branches.The construction is definitely breadth first.
Running
binaryDepths
outputs the outer levels before the inner onesFrom Lazy to Downright Slothful
The hybrid solution from the earlier section is not lazy enough to have the same strictness semantics as
Data.Tree
'sunfoldTree
. It is the first in a series of algorithms, each slightly lazier than their predecessor, but none lazy enough to have the same strictness semantics asunfoldTree
.The hybrid solution does not provide a guarantee that exploring a portion of a tree doesn't demand exploring other portions of the same tree. Nor will the code presented below. In one particular yet common case identified by dfeuer exploring only a
log(N)
sized slice of a finite tree forces the entirety of the tree. This happens when exploring the last descendant of each branch of a tree with constant depth. When compressing the tree we throw out every trivial branch with no descendants, which is necessary to avoidO(n^2)
running time. We can only lazily skip over this portion of compression if we can quickly show that a branch has at least one descendant and we can therefore reject the patternFree []
. At the greatest depth of the tree with constant depth, none of the branches have any remaining descendants, so we can never skip a step of the compression. This results in exploring the entire tree to be able to visit the very last node. When the entire tree to that depth is non-finite due to infinite branching factor, exploring a portion of the tree fails to terminate when it would terminate when generated byunfoldTree
.The compression step in the hybrid solution section compresses away branches with no descendants in the first generation that they can be discovered in, which is optimal for compression but not optimal for laziness. We can make the algorithm lazier by delaying when this compression occurs. If we delay it by a single generation (or even any constant number of generations) we will maintain the
O(n)
upper bound on time. If we delay it by a number of generations that somehow depends onN
we would necessarily sacrifice theO(N)
time bound. In this section we will delay the compression by a single generation.To control how compression happens, we will separate stuffing the innermost
[]
into theFree []
structure from compressing away degenerate branches with 0 or 1 descendants.Because part of this trick doesn't work without a lot of laziness in the compression, we will adopt a paranoid level of excessively slothful laziness everywhere. If anything about a result other than the tuple constructor
(,)
could be determined without forcing part of its input with a pattern match we will avoid forcing it until it is necessary. For tuples, anything pattern matching on them will do so lazily. Consequently, some of the code below will look like core or worse.bindFreeInvertible
replacesPure [b,...]
withFree [Pure b,...]
compressFreeList
removes occurrences ofFree []
and replacesFree [xs]
withxs
.The overall compression will not bind the
Pure []
s intoFree
s until after the degenerateFree
s have been compressed away, delaying compression of degenerateFree
s introduced in one generation to the next generation's compression.Out of continued paranoia, the helpers
getFree
andgetPure
are also made irrefutably lazy.This very quickly solves the problematic example dfeuer discovered
But since we only delayed the compression by
1
generation, we can recreate exactly the same problem if the very last node of the very last branch is1
level deeper than all of the other branches.