Binary trees (and hence ordered forests) can be represented as binary strings. The binary string is obtained by traversing a binary tree in preorder, recording a 1 for every node and a 0 for every empty subtree (null link).
This means that if I'm given a binary tree, I can do a preorder traversal and produce a binary sequence representation.
Is the opposite also possible? If I'm given this binary sequence 11011000101101010001
, can I draw the binary tree?
Sure!
Create a binary tree, assign a number to each element
These are your indices.
Define a size for your node data and you can get the information for a node in the tree at
data_size * index
This is commonly how heaps are represented.
There are plenty of ways to represent a binary tree as a string. However, because there are so many ways you have to agree on a particular representation before you can move forward. What I've mentioned above is just one way to represent a binary tree, and it may not be the standard that you all are using.
Yes you can.
Label the internal nodes as
a
and the external ones asb
; of course you can assumea
as0
andb
as1
(and vice-versa). But I think is easier to distinguish letters than numbers (although this matter of "taste").If you don't know what are the "external nodes", then you can assume that they are the
NULL
pointers.Now, the preorder traversal of any tree labeled as I have said forms a word belonging to Lukasiewicz language. It could be shown that this word is unique. That is, for any pair of trees
t1
andt2
,code(t1) != code(t2)
; always! In addition (and this should have been the reason for which you are concerned to Huffman encoding), the Lukasiewicz language is prefix.For example, for the tree
Its preorder traversal is
aaaabbabbaabbbababb
or000011011001110111
I leave to you the code for generating a code; it is a preorder traversal. If you are interested in reversing it, that is, to get the tree given the code, then this pseudo-code, which tries to answer your question, would do it:
In a real implementation, you would read bits.
The routine above assumes that the code is right; that is, it was generated from a binary tree. Note that not all words composed by
a
's andb
's belong to the Lukasiewicz language. But some showable rules could be applied on them.First, the number of
b
's must be exactly the number ofa
's plus one. Second, if eacha
weights one (1) and eachb
weights minus one (-1), then, at the exception of lastb
, the addition through all the word for each letter never can be smaller than zero. Only at the end of the word the addition will be-1
. If the word does not satisfy this condition, then you can assume that it is invalid.