I am impatient, looking forward to understanding catamorphism related to this SO question :)
I have only practiced the beginning of Real World Haskell tutorial. So, Maybe I'm gonna ask for way too much right now, if it was the case, just tell me the concepts I should learn.
Below, I quote the wikipedia code sample for catamorphism.
I would like to know your opinion about foldTree below, a way of traversing a Tree, compared to this other SO question and answer, also dealing with traversing a Tree n-ary tree traversal. (independantly from being binary or not, I think the catamorphism below can be written so as to manage n-ary tree)
I put in comment what I understand, and be glad if you could correct me, and clarify some things.
{-this is a binary tree definition-}
data Tree a = Leaf a
| Branch (Tree a) (Tree a)
{-I dont understand the structure between{}
however it defines two morphisms, leaf and branch
leaf take an a and returns an r, branch takes two r and returns an r-}
data TreeAlgebra a r = TreeAlgebra { leaf :: a -> r
, branch :: r -> r -> r }
{- foldTree is a morphism that takes: a TreeAlgebra for Tree a with result r, a Tree a
and returns an r -}
foldTree :: TreeAlgebra a r -> Tree a -> r
foldTree a@(TreeAlgebra {leaf = f}) (Leaf x ) = f x
foldTree a@(TreeAlgebra {branch = g}) (Branch l r) = g (foldTree a l) (foldTree a r)
at this point I am having many difficulties, I seem to guess that the morphism leaf
will be applied to any Leaf
But so as to use this code for real, foldTree needs to be fed a defined TreeAlgebra,
a TreeAlgebra that has a defined morphism leaf so as to do something ?
but in this case in the foldTree code I would expect {f = leaf} and not the contrary
Any clarification from you would be really welcome.
Not exactly sure what you're asking. But yeah, you feed a
TreeAlgebra
tofoldTree
corresponding to the computation you want to perform on the tree. For example, to sum all the elements in a tree ofInt
s you would use this algebra:Which means, to get the sum of a leaf, apply
id
(do nothing) to the value in the leaf. To get the sum of a branch, add together the sums of each of the children.The fact that we can say
(+)
for branch instead of, say,\x y -> sumTree x + sumTree y
is the essential property of the catamorphism. It says that to compute some functionf
on some recursive data structure it suffices to have the values off
for its immediate children.Haskell is a pretty unique language in that we can formalize the idea of catamorphism abstractly. Let's make a data type for a single node in your tree, parameterized over its children:
See what we did there? We just replaced the recursive children with a type of our choosing. This is so that we can put the subtrees' sums there when we are folding.
Now for the really magical thing. I'm going to write this in pseudohaskell -- writing it in real Haskell is possible, but we have to add some annotations to help the typechecker which can be kind of confusing. We take the "fixed point" of a parameterized data type -- that is, constructing a data type
T
such thatT = TreeNode a T
. They call this operatorMu
.Look carefully here. The argument to
Mu
isn't a type, likeInt
orFoo -> Bar
. It's a type constructor likeMaybe
orTreeNode Int
-- the argument toMu
itself takes an argument. (The possibility of abstracting over type constructors is one of the things that makes Haskell's type system really stand out in its expressive power).So the type
Mu f
is defined as takingf
and filling in its type parameter withMu f
itself. I'm going to define a synonym to reduce some of the noise:Expanding
Mu IntNode
, we get:Do you see how
Mu IntNode
is equivalent to yourTree Int
? We have just torn the recursive structure apart and then usedMu
to put it back together again. This gives us the advantage that we can talk about allMu
types at once. This gives us what we need to define a catamorphism.Let's define:
I said the essential property of the catamorphism is that to compute some function
f
, it suffices to have the values off
for its immediate children. Let's call the type of the thing we are trying to computer
, and the data structurenode
(IntNode
would be a possible instantiation of this). So to computer
on a particular node, we need the node with its children replaced with theirr
s. This computation has typenode r -> r
. So a catamorphism says that if we have one of these computations, then we can computer
for the entire recursive structure (remember recursion is denoted explicitly here withMu
):Making this concrete for our example, this looks like:
Restating, if we can take a node with
r
s for its children and compute anr
, then we can compute anr
for an entire tree.In order to actually compute this, we need
node
to be aFunctor
-- that is we need to be able to map an arbitrary function over the children of a node.This can be done straightforwardly for
IntNode
.Now, finally, we can give a definition for
cata
(theFunctor node
constraint just says thatnode
has a suitablefmap
):I used the parameter name
t
for the mnemonic value of "tree". This is an abstract, dense definition, but it is really very simple. It says: recursively performcata f
-- the computation we are doing over the tree -- on each oft
's children (which are themselvesMu node
s) to get anode r
, and then pass that result tof
compute the result fort
itself.Tying this back to the beginning, the algebra you are defining is essentially a way of defining that
node r -> r
function. Indeed, given aTreeAlgebra
, we can easily get the fold function:Thus the tree catamorphism can be defined in terms of our generic one as follows:
I'm out of time. I know that got really abstract really fast, but I hope it at least gave you a new viewpoint to help your learning. Good luck!
I think you were were asking a question about the {}'s. There is an earlier question with a good discussion of {}'s. Those are called Haskell's record syntax. The other question is why construct the algebra. This is a typical function paradigm where you generalize data as functions.
The most famous example is Church's construction of the Naturals, where
f = + 1
andz = 0
,0 = z
,1 = f z
,2 = f (f z)
,3 = f (f (f z))
, etc...What you are seeing is essentially the same idea being applied to a tree. Work the church example and the tree will click.