The Context
The context of this question is that I want to play around with Gene Expression Programming (GEP), a form of evolutionary algorithm, using Erlang. GEP makes use of a string based DSL called 'Karva notation'. Karva notation is easily translated into expression parse trees, but the translation algorithm assumes an implementation having mutable objects: incomplete sub-expressions are created early-on the translation process and their own sub-expressions are filled-in later-on with values that were not known at the time they were created.
The purpose of Karva notation is that it guarantees syntactically correct expressions are created without any expensive encoding techniques or corrections of genetic code. The problem is that with a single-assignment programming language like Erlang, I have to recreate the expression tree continually as each sub expression gets filled in. This takes an inexpensive - O(n)? - update operation and converts it into one that would complete in exponential time (unless I'm mistaken). If I can't find an efficient functional algorithm to convert K-expressions into expression trees, then one of the compelling features of GEP is lost.
The Question
I appreciate that the K-expression translation problem is pretty obscure, so what I want is advice on how to convert an inherently-non-functional algorithm (alg that exploits mutable data structures) into one that does not. How do pure functional programming languages adapt many of the algorithms and data structures that were produced in the early days of computer science that depend on mutability to get the performance characteristics they need?
I think I figured out how to solve your particular problem with the K trees, (the general problem is too hard :P). My solution is presented in some horrible sort of hybrid Python-like psudocode (I am very slow on my FP today) but it doesn't change a node after you create one (the trick is building the tree bottom-up)
First, we need to find which nodes belong to which level:
So in the
+/*abcd
, example, this should give you[+, /*, abcd]
. Now you can convert this into a tree bottom up:I am pretty sure we can convert this into single assignment Erlang/Haskell very easily now.
Carefully designed immutability avoids unecessary updating
Immutable data structures are only an efficiency problem if they're constantly changing, or you build them up the wrong way. For example, continually appending more to the end of a growing list is quadratic, whereas concatenating a list of lists is linear. If you think carefully, you can usually build up your structure in a sensible way, and lazy evaluation is your friend - hand out a promise to work it out and stop worrying.
Blindly trying to replicate an imperative algorithm can be ineffecient, but you're mistaken in your assertion that functional programming has to be asymptotically bad here.
Case study: pure functional GEP: Karva notation in linear time
I'll stick with your case study of parsing Karva notation for GEP. ( I've played with this solution more fully in this answer.)
Here's a fairly clean pure functional solution to the problem. I'll take the opportunity to name drop some good general recursion schemes along the way.
Code
(Importing
Data.Tree
suppliesdata Tree a = Node {rootLabel :: a, subForest :: Forest a}
wheretype Forest a = [Tree a]
.)A hylomorphism is the composition of an anamorphism (build up, unfoldr) and a catamorphism (combine, foldr). These terms are introduced to the FP community in the seminal paper Functional Programming with Bananas, Lenses and Barbed wire.
We're going to pull the levels out (ana/unfold) and combine them back together (cata/fold).
To pull out a level, we use the total arity from the previous level to find where to split off this new level, and pass on the total arity for this one ready for next time:
To combine a level (as a String) with the level below (that's already a Forest), we just pull off the number of trees that each character needs.
Now we can parse the Karva using a hylomorphism. Note that we seed it with a total arity from outside the string of
1
, since there's only one node at the root level. Correspondingly we applyhead
to the result to get this singleton back out after the hylomorphism.Linear Time
There's no exponential blowup, nor repeated O(log(n)) lookups or expensive modifications, so we shouldn't be in too much trouble.
arity
is O(1
)splitAt part
is O(part
)pullLevel (part,cs)
is O(part
) for grab usingsplitAt
to getlevel
, plus O(part
) for themap arity level
, so O(part
)combineLevel (c:cs)
is O(arity c
) for thesplitAt
, and O(sum $ map arity cs
) for the recursive callhylomorphism [] combineLevel pullLevel zero (1,cs)
pullLevel
call for each level, so the totalpullLevel
cost is O(sum
parts) = O(n)combineLevel
call for each level, so the totalcombineLevel
cost is O(sum $ map arity
levels) = O(n), since the total arity of the entire input is bound by n for valid strings.zero
(which is O(1
)), and#levels
is bound byn
, so that's below O(n) tooHence
karvaToTree
is linear in the length of the input.I think that puts to rest the assertion that you needed to use mutability to get a linear algorithm here.
Demo
Let's have a draw of the results (because Tree is so full of syntax it's hard to read the output!). You have to
cabal install pretty-tree
to getData.Tree.Pretty
.which matches the output expected from this tutorial where I found the example:
There isn't a single way to do this, it really has to be attempted case-by-case. I typically try to break them down into simpler operations using fold and unfold and then optimize from there. Karva decoding case is a breadth-first tree unfold as others have noted, so I started with treeUnfoldM_BF. Perhaps there are similar functions in Erlang.
If the decoding operation is unreasonably expensive, you could memoize the decoding and share/reuse subtrees... though it probably wouldn't fit into a generic tree unfolder and you'd need to write specialized function to do so. If the fitness function is slow enough, it may be fine to use a naive decoder like the one I have listed below. It will fully rebuild the tree each invocation.
There are a couple of solutions when mutable state in functional programming is required.
Use a different algorithm that solves the same problem. E.g. quicksort is generally regarded as mutable and may therefore be less useful in a functional setting, but mergesort is generally better suited for a functional setting. I can't tell if this option is possible or makes sense in your case.
Even functional programming languages usually provide some way to mutate state. (This blog post seems to show how to do it in Erlang.) For some algorithms and data structures this is indeed the only available option (there's active research on the topic, I think); for example hash tables in functional programming languages are generally implemented with mutable state.
In your case, I'm not so sure immutability really leads to a performance bottleneck. You are right, the (sub)tree will be recreated on update, but the Erlang implementation will probably reuse all the subtrees that haven't changed, leading to O(log n) complexity per update instead of O(1) with mutable state. Also, the nodes of the trees won't be copied but instead the references to the nodes, which should be relatively efficient. You can read about tree updates in a functional setting in e.g. the thesis from Okasaki or in his book "Purely Functional Data Structures" based on the thesis. I'd try implementing the algorithm with an immutable data structure and switch to a mutable one if you have a performance problem.
Also see some relevant SO questions here and here.