What Self Balancing Tree is simplest in Functional

2019-03-11 01:19发布

问题:

I'm designing a self balancing tree in Haskell. As an exercise and because it is nice to have in your back hand.

Previously in C and Python I preferred Treaps and Splay Trees due to their simple balancing rules. I always disliked R/B Trees, since they seemed like more work than they were worth.

Now, due to the functional nature of Haskell, things seem to have changed. I can write a R/B insert function in 10 lines of code. Treaps on the other hand requires wrapping to store the random number generator, and Splay Trees are a pain to do top-down.

So I'm asking if you have experience with other types of trees? Which ones are better at utilizing the pattern matching and top-down nature of functional languages?

回答1:

Ok, I guess there wasn't a lot of references or research for answering this question. Instead I've taken the time to try your different ideas and trees. I didn't find anything a lot better than RB trees, but perhaps that's just search bias.

The RB tree can be (insertion) balanced with four simple rules, as shown by Chris Okasaki:

balance T (T R (T R a x b) y c) z d = T R (T B a x b) y (T B c z d)
balance T (T R a x (T R b y c)) z d = T R (T B a x b) y (T B c z d)
balance T a x (T R b y (T R c z d)) = T R (T B a x b) y (T B c z d)
balance T a x (T R (T R b y c) z d) = T R (T B a x b) y (T B c z d)
balance T a x b = T B a x b

AVL trees can be balanced in a similar pattern matching way. However the rules don't compress as well:

balance T (T (T a x b   dx) y c (-1)) z d (-2) = T (T a x b dx) y (T c z d  0) 0
balance T a x (T b y (T c z d   dz)   1 )   2  = T (T a x b  0) y (T c z d dz) 0
balance T (T a x (T b y c   1 )   1 ) z d (-2) = T (T a x b -1) y (T c z d  0) 0
balance T (T a x (T b y c (-1))   1 ) z d (-2) = T (T a x b  0) y (T c z d  1) 0
balance T (T a x (T b y c   _ )   1 ) z d (-2) = T (T a x b  0) y (T c z d  0) 0
balance T a x (T (T b y c   1 ) z d (-1))   2  = T (T a x b -1) y (T c z d  0) 0
balance T a x (T (T b y c (-1)) z d (-1))   2  = T (T a x b  0) y (T c z d  1) 0
balance T a x (T (T b y c   _ ) z d (-1))   2  = T (T a x b  0) y (T c z d  0) 0
balance t = t

As AVL trees seams to generally be considered inferior to RB trees, they are probably not worth the extra hassle.

AA trees could theoretically be balanced nice and easily by:

balance T n (T n a x b) y c = T n a x (T n b y c) -- skew
balance T n a x (T n b y (T n c z d)) = T (n+1) (T n a x b) y (T n c z d) --split
balance T n a x b = T n a x b

But unfortunately Haskell don't like the overloading of n. It is possible that a less standard implementation of AA trees, not using ranks, but something more similar to R and B, would work well.

Splay trees are difficult because you need to focus on a single node, rather than the static structure of the tree. It can be done by merging insert and splay.

Treaps are also uneasy to do in a functional environment, as you don't have a global random generator, but need to keep instances in every node. This can be tackled by leaving the task of generating priorities to the client, but even then, you can't do priority comparison using pattern matching.



回答2:

As you say Red Black trees aren't that hard to use. Have you given finger trees a look? You might be interested in augmenting your base data structure with something like a zipper. Another tree you might find interesting is the AA tree it is a simplification of Red Black Trees.



回答3:

It's the one that's already implemented.

There are fine implementations in Haskell of balanced trees such as Data.Map and Data.Set. Don't they fulfill your needs? Don't reimplement, reuse.



回答4:

The OCaml standard library uses an AVL tree for its map functor. It seems as though it's easier to implement than an RB-tree if you include a remove operation.