I'm not sure how I'm going to attack the traversing of my Huffman Tree. The tree is correct, I just have a hard time figuring out how to traverse it in a good way. For some reason, my traversing method gives no result...
UPDATE: Cleaned up the code, made it more Object Oriented
Node class:
public class Node
{
public int frekvens; //Frequency
public char tegn; //Symbol
public Node venstre; //Left child
public Node høyre; //Right child
public string s; //result string
public string resultat;
public Node (char c) // Node constructor containing symbol.
{
frekvens = 1;
tegn = c;
}
public Node (int f, Node venstre, Node høyre) // Node Constructor containing frequency and children
{
frekvens = f;
this.venstre = venstre;
this.høyre = høyre;
}
public Node (Node node) // Node constructor containing a node
{
frekvens = node.frekvens;
tegn = node.tegn;
this.venstre = venstre;
this.høyre = høyre;
}
public void ØkMed1() // Inkrement frequency by one
{
frekvens = frekvens + 1;
}
public char getVenstreTegn ()
{
return venstre.tegn;
}
public char getHøyreTegn ()
{
return venstre.tegn;
}
public int getVenstreFrekvens ()
{
return venstre.frekvens;
}
public int getHøyreFrekvens ()
{
return høyre.frekvens;
}
public int getFrekvens()
{
return frekvens;
}
public bool ErTegn(char c)
{
if ( c == tegn)
{
return false;
}
else
{
return true;
}
}
//Pretty sure this does not work as intended
public string traverser (Node n) //Traverse the tree
{
if (n.tegn != '\0') //If the node containes a symbol --> a leaf
{
resultat += s;
}
else
{
if (n.getVenstreTegn() == '\0') //If left child does not have a symbol
{
s += "0";
traverser(n.venstre);
}
if (n.getHøyreTegn() == '\0') //If right child does not have a symbol
{
s += "1";
traverser(n.høyre);
}
}
return resultat;
}
public string Resultat() //Used priviously to check if i got the correct huffman tree
{
string resultat;
resultat = "Tegn: " + Convert.ToString(tegn) +" frekvens: " + Convert.ToString(frekvens) + "\n";
return resultat;
}
}
Huffman_Tree Class:
public class Huffman_Tre
{
string treString;
List<Node> noder = new List<Node>();
public Node rot;
public void bygg (string input)
{
bool funnet; //Found
char karakter; //character
for (int i = 0; i < input.Length;i++) //Loops through string and sets character
//with coresponding freqeuncy in the node list
{
karakter = input[i];
funnet = false; //default
for (int j = 0; j< noder.Count; j++)
{
if (noder[j].ErTegn(karakter) == false) //if the character already exists
{
noder[j].ØkMed1(); //inkrement frequency by one
funnet = true;
break;
}
}
if (!funnet) //if the character does not exist
{
noder.Add(new Node(karakter)); //add the character to list
}
}
//Sorting node list acending by frequency
var sortertListe = noder.OrderBy(c => c.frekvens).ToList();
noder = sortertListe;
do
{
noder.Add(new Node((noder[0].frekvens + noder[1].frekvens), noder[0],noder[1]));
//Remove the leaf nodes
noder.RemoveAt(0);
noder.RemoveAt(0);
} while(noder.Count >= 2);
}
public Node getRot()
{
return rot;
}
public string visTre()
{
foreach (Node node in noder)
{
treString += node.Resultat();
}
return treString;
}
public bool erNull()
{
if (noder[0].tegn == '\0')
{
return true;
}
else
return false;
}
}
Main Program:
private void btnKomprimer_Click(object sender, System.Windows.RoutedEventArgs e)
{
string input; //The string input I want to compress
input = txtInput.Text; //initialize input to text input
input = input.ToLower();
txtOutput.Text = "";
Huffman_Tre tre = new Huffman_Tre();
tre.bygg(input);
Node rot = new Node(tre.getRot());
txtOutput.Text += rot.traverser(rot);
}
}
As I had a little bit of time left, I worked out an example of a Huffman tree, while playing with C# 6.0. It's not optimized (not even by far!), but it works fine as an example. And it will help you to look where your 'challenge' may arise. As my English is far better than my Scandinavian knowledge, I used English naming, I hope you don't mind.
First, let's start with the class that keeps the frequencies.
Please note that the ToString() method uses an extension method that is able to 'dump' the contents of the Dictionary used. The extensions is located in a static class called Helpers and looks like this:
Now, with this FrequencyTable in place, we can start looking on how to build up the Nodes. Huffman works with a binary tree, so it's best to generate a Node class having a left and right child node. I also took the liberty to perform the traversal algorithm here as well. This class is built up as following:
I use the Node class within the HuffmanTree class. As, logically, a tree is built up from nodes. The corresponding HuffmanTree is written this way:
What is interesting in this code, is that I decided to use
BitArrays
that will hold the binary paths for the tree when it's build up. Thepublic BitArray Encode(string source)
method here contains a dirty hack. I keep track of the total amount of bits used for encoding and store this within theBitCountForTree
property. When performing a decode, I'll use this property to remove any trailing bits that may arise. There is a way nicer way to perform this, but I'll leave that open for you to find out.Also, this class makes use of an extension method written for the HuffmanNode. It's a simple one though:
This extension method is convenient to determine whether or not a given node is actually a leafnode. A leaf is a node which has no childnodes left and thus the end of a binary tree (or better a branch of that tree).
Now the interesting part, how do I make things work here. I have build a Windows Forms application having 3 textboxes. One for the actual input, one for the binary (encoded) output and the last for showing the compressed result. I also placed two simple buttons, one to perform the Huffman encoding and one for the Huffman decoding.
The Huffman encoding method is written as following (just in the eventhandler of the encode button):
Note that the encoded binary string does not have any trailing bits. I'll leave that up to the Encoding mechanisms of C#. The downside of this, is that I have to keep track of it when decoding.
The decoding is not too hard now as well. Although, for this example, I am making use of the compressed output generated by the encoding code placed above. Also, I am assuming that the Huffman tree (and it's frequency table!!!) are already built. Normally, the frequency table is stored within the compressed file, so that it can be rebuild.
Please pay attention to the dirty hack I created, as the
Encoding.Default.GetBytes(tbOutput.Text)
surely generates a byte array, it may contain trailing bits which need not to be decoded. Hence that I only take the amount of bits that I will actually need, based upon the rebuild tree.So when running, my example provides the following output, when using the 'world renown sentence' "The quick brown fox jumps over the lazy programmer":
After pressing the "Huff encode" button:
And after pressing the "Huff decode" button:
Now this code can really use some optimizations, as you might consider using Arrays instead of Dictionaries. There are more, but it's up to you for consideration.