Huffman Tree: Traversing [closed]

2019-08-01 01:41发布

I'm not sure how I'm going to attack the traversing of my Huffman Tree. The tree is correct, I just have a hard time figuring out how to traverse it in a good way. For some reason, my traversing method gives no result...

UPDATE: Cleaned up the code, made it more Object Oriented

Node class:

public class Node
{
    public int frekvens; //Frequency
    public char tegn; //Symbol
    public Node venstre; //Left child
    public Node høyre; //Right child
    public string s; //result string
    public string resultat;
    public Node (char c) // Node constructor containing symbol.
    {
        frekvens = 1;
        tegn = c;
    }

    public Node (int f, Node venstre, Node høyre) // Node Constructor containing frequency and children
        {
            frekvens = f;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public Node (Node node) // Node constructor containing a node
        {
            frekvens = node.frekvens;
            tegn = node.tegn;
            this.venstre = venstre;
            this.høyre = høyre;
        }

    public void ØkMed1() // Inkrement frequency by one
    {
        frekvens = frekvens + 1;
    }


    public char getVenstreTegn ()
    {
        return venstre.tegn;
    }

    public char getHøyreTegn ()
    {
        return venstre.tegn;
    }

    public int getVenstreFrekvens ()
    {
        return venstre.frekvens;
    }

    public int getHøyreFrekvens ()
    {
        return høyre.frekvens;
    }

    public int getFrekvens()
    {
        return frekvens;
    }


    public bool ErTegn(char c)
    {
        if ( c == tegn)
        {
            return false;
        }
        else
        {
            return true;
        }
    }

    //Pretty sure this does not work as intended
    public string traverser (Node n) //Traverse the tree
    {
        if (n.tegn != '\0') //If the node containes a symbol --> a leaf
        {
            resultat += s;  
        }
        else
        {
            if (n.getVenstreTegn() == '\0') //If left child does not have a symbol
            {
                s += "0";
                traverser(n.venstre);
            }
            if (n.getHøyreTegn() == '\0') //If right child does not have a symbol
            {
                s += "1";
                traverser(n.høyre);
            }
        }
        return resultat;
    }
    public string Resultat() //Used priviously to check if i got the correct huffman tree
    {
        string resultat;
        resultat = "Tegn: " + Convert.ToString(tegn) +"  frekvens: " + Convert.ToString(frekvens) + "\n";
        return resultat;
    }
}

Huffman_Tree Class:

public class Huffman_Tre
{
    string treString;
    List<Node> noder = new List<Node>();
    public Node rot;
    public void bygg (string input)
    {
        bool funnet; //Found
        char karakter; //character

        for (int i = 0; i < input.Length;i++) //Loops through string and sets character
            //with coresponding freqeuncy in the node list
        {   
            karakter = input[i];
            funnet = false; //default
            for (int j = 0; j< noder.Count; j++)
            {
                if (noder[j].ErTegn(karakter) == false) //if the character already exists
                {
                    noder[j].ØkMed1(); //inkrement frequency by one
                    funnet = true; 
                    break;
                }
            }
            if (!funnet) //if the character does not exist 
            {
                noder.Add(new Node(karakter)); //add the character to list
            }
        }
        //Sorting node list acending by frequency
        var sortertListe = noder.OrderBy(c => c.frekvens).ToList();

        noder = sortertListe; 

        do
        {
            noder.Add(new Node((noder[0].frekvens + noder[1].frekvens), noder[0],noder[1]));

            //Remove the leaf nodes
            noder.RemoveAt(0);
            noder.RemoveAt(0); 
        } while(noder.Count >= 2);

    }

    public Node getRot()
    {
        return rot;
    }

    public string visTre()
    {

        foreach (Node node in noder)
        {
            treString += node.Resultat();
        }
        return treString;
    }
    public bool erNull()
    {
        if (noder[0].tegn == '\0')
        {
            return true;
        }
        else
            return false;
    }
}

Main Program:

private void btnKomprimer_Click(object sender, System.Windows.RoutedEventArgs e)
    {
        string input; //The string input I want to compress
        input = txtInput.Text; //initialize input to text input
        input = input.ToLower(); 
        txtOutput.Text = "";

        Huffman_Tre tre = new Huffman_Tre();

        tre.bygg(input);

        Node rot = new Node(tre.getRot());

        txtOutput.Text += rot.traverser(rot);
    }
}

1条回答
放荡不羁爱自由
2楼-- · 2019-08-01 02:09

As I had a little bit of time left, I worked out an example of a Huffman tree, while playing with C# 6.0. It's not optimized (not even by far!), but it works fine as an example. And it will help you to look where your 'challenge' may arise. As my English is far better than my Scandinavian knowledge, I used English naming, I hope you don't mind.

First, let's start with the class that keeps the frequencies.

public sealed class HuffmanFrequencyTable
{       
    #region Properties
    /// <summary>
    /// Holds the characters and their corresponding frequencies
    /// </summary>
    public Dictionary<char, int> FrequencyTable  { get; set; } = new Dictionary<char, int>();

    #endregion

    #region Methods

    /// <summary>
    /// Clears the internal frequency table
    /// </summary>
    public void Clear()
    {
        FrequencyTable?.Clear();            
    }

    /// <summary>
    /// Accepts and parses a new line (string) which is then 
    /// merged with the existing dictionary or frequency table
    /// </summary>
    /// <param name="line">The line to parse</param>
    public void Accept(string line)
    {
        if (!string.IsNullOrEmpty(line))
        {
            line.GroupBy(ch => ch).
                 ToDictionary(g => g.Key, g => g.Count()).
                 ToList().
                 ForEach(x => FrequencyTable[x.Key] = x.Value);
        }
    }

    /// <summary>
    /// Performs a dump of the frequency table, ordering all characters, lowest frequency first.
    /// </summary>
    /// <returns>The frequency table in the format 'character [frequency]'</returns>
    public override string ToString()
    {            
        return FrequencyTable?.PrintFrequencies();
    }
    #endregion
}

Please note that the ToString() method uses an extension method that is able to 'dump' the contents of the Dictionary used. The extensions is located in a static class called Helpers and looks like this:

/// <summary>
/// Extension method that helps to write the contents of a generic Dictionary to a string, ordered by it's values and 
/// printing the key and it's value between brackets.
/// </summary>
/// <typeparam name="TKey">Generic key</typeparam>
/// <typeparam name="TValue">Generic value type</typeparam>
/// <param name="dictionary">The dictionary</param>
/// <exception cref="ArgumentNullException">Throws an argument null exception if the provided dictionary is null</exception>
/// <returns></returns>
public static string PrintFrequencies<TKey, TValue>(this IDictionary<TKey, TValue> dictionary)
{
    if (dictionary == null)
        throw new ArgumentNullException("dictionary");

    var items = from kvp in dictionary
                orderby kvp.Value
                select kvp.Key + " [" + kvp.Value + "]";

    return string.Join(Environment.NewLine, items);
}

Now, with this FrequencyTable in place, we can start looking on how to build up the Nodes. Huffman works with a binary tree, so it's best to generate a Node class having a left and right child node. I also took the liberty to perform the traversal algorithm here as well. This class is built up as following:

public sealed class HuffmanNode
{
    #region Properties

    /// <summary>
    /// Holds the left node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Left { get; set; } = null;

    /// <summary>
    /// Holds the right node, if applicable, otherwise null
    /// </summary>
    public HuffmanNode Right { get; set; } = null;

    /// <summary>
    /// Holds the Character (or null) for this particular node
    /// </summary>
    public char? Character { get; set; } = null;

    /// <summary>
    /// Holds the frequency for this particular node, defaulted to 0
    /// </summary>
    public int Frequency { get; set; } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Traverses all nodes recursively returning the binary 
    /// path for the corresponding character that has been found.
    /// </summary>
    /// <param name="character">The character to find</param>
    /// <param name="data">The datapath (containing '1's and '0's)</param>
    /// <returns>The complete binary path for a character within a node</returns>
    public List<bool> Traverse(char? character, List<bool> data)
    {
        //Check the leafs for existing characters
        if (null == Left && null == Right)
        {
            //We're at an endpoint of our 'tree', so return it's data or nothing when the symbol
            //characters do not match
            return (bool)character?.Equals(Character) ? data : null;
        }
        else
        {
            List<bool> left = null;
            List<bool> right = null;

            //TODO: If possible refactor with proper C# 6.0 features
            if (null != Left)
            {
                List<bool> leftPath = new List<bool>(data);                    
                leftPath.Add(false); //Add a '0'
                left = Left.Traverse(character, leftPath); //Recursive traversal for child nodes within this left node.
            }

            if (null != Right)
            {
                List<bool> rightPath = new List<bool>(data);                    
                rightPath.Add(true); //Add a '1'
                right = Right.Traverse(character, rightPath); //Recursive traversal for childnodes within this right node
            }

            return (null != left) ? left : right;
        }
    }        

    #endregion
}

I use the Node class within the HuffmanTree class. As, logically, a tree is built up from nodes. The corresponding HuffmanTree is written this way:

public sealed class HuffmanTree
{
    #region Fields
    /// <summary>
    /// Field for keeping the Huffman nodes in. Internally used.
    /// </summary>
    private List<HuffmanNode> nodes = new List<HuffmanNode>();        
    #endregion

    #region Properties

    /// <summary>
    /// Holds the Huffman tree
    /// </summary>
    public HuffmanNode Root   {  get; set; } = null;

    /// <summary>
    /// Holds the frequency table for all parsed characters
    /// </summary>
    public HuffmanFrequencyTable Frequencies { get; private set; } = new HuffmanFrequencyTable()

    /// <summary>
    /// Holds the amount of bits after encoding the tree.
    /// Primary usable for decoding.
    /// </summary>
    public int BitCountForTree  { get;  private set;  } = default(int);

    #endregion

    #region Methods

    /// <summary>
    /// Builds the Huffman tree
    /// </summary>
    /// <param name="source">The source to build the Hufftree from</param>
    /// <exception cref="ArgumentNullException">Thrown when source is null or empty</exception>
    public void BuildTree(string source)
    {
        nodes.Clear(); //As we build a new tree, first make sure it's clean :)

        if (string.IsNullOrEmpty(source))
            throw new ArgumentNullException("source");
        else
        {
            Frequencies.Accept(source);

            foreach (KeyValuePair<char, int> symbol in Frequencies.FrequencyTable)
            {
                nodes.Add(new HuffmanNode() { Character = symbol.Key, Frequency = symbol.Value });
            }

            while (nodes.Count > 1)
            {
                List<HuffmanNode> orderedNodes = nodes.OrderBy(node => node.Frequency).ToList();

                if (orderedNodes.Count >= 2)
                {
                    List<HuffmanNode> takenNodes = orderedNodes.Take(2).ToList();

                    HuffmanNode parent = new HuffmanNode()
                    {
                        Character = null,
                        Frequency = takenNodes[0].Frequency + takenNodes[1].Frequency,
                        Left = takenNodes[0],
                        Right = takenNodes[1]
                    };

                    //Remove the childnodes from the original node list and add the new parent node
                    nodes.Remove(takenNodes[0]);
                    nodes.Remove(takenNodes[1]);
                    nodes.Add(parent);
                }
            }

            Root = nodes.FirstOrDefault();
        }
    }

    /// <summary>
    /// Encodes a given string to the corresponding huffman encoding path
    /// </summary>
    /// <param name="source">The source to encode</param>
    /// <returns>The binary huffman representation of the source</returns>
    public BitArray Encode(string source)
    {
        if (!string.IsNullOrEmpty(source))
        {
            List<bool> encodedSource = new List<bool>();
            //Traverse the tree for each character in the passed source (string) and add the binary path to the encoded source
            encodedSource.AddRange(source.SelectMany(character =>
                                        Root.Traverse(character, new List<bool>())
                                    ).ToList()
                                  );
            //For decoding, we might need the amount of bits to skip trailing bits.
            BitCountForTree = encodedSource.Count;
            return new BitArray(encodedSource.ToArray());
        }
        else return null;
    }

    /// <summary>
    /// Decodes a given binary path to represent it's string value
    /// </summary>
    /// <param name="bits">BitArray for traversing the tree</param>
    /// <returns></returns>
    public string Decode(BitArray bits)
    {
        HuffmanNode current = Root;
        string decodedString = string.Empty;

        foreach (bool bit in bits)
        {
            //Find the correct current node depending on the bit set or not set.
            current = (bit ? current.Right ?? current : current.Left ?? current);               

            if (current.IsLeaf())
            {
                decodedString += current.Character;
                current = Root;
            }
        }

        return decodedString;
    }

    #endregion
}

What is interesting in this code, is that I decided to use BitArrays that will hold the binary paths for the tree when it's build up. The public BitArray Encode(string source) method here contains a dirty hack. I keep track of the total amount of bits used for encoding and store this within the BitCountForTree property. When performing a decode, I'll use this property to remove any trailing bits that may arise. There is a way nicer way to perform this, but I'll leave that open for you to find out.

Also, this class makes use of an extension method written for the HuffmanNode. It's a simple one though:

  /// <summary>
  /// Determines whether a given Huffman node is a leaf or not.
  /// A node is considered to be a leaf when it has no childnodes
  /// </summary>
  /// <param name="node">A huffman node</param>
  /// <returns>True if no children are left, false otherwise</returns>
  public static bool IsLeaf(this HuffmanNode node)
  {
      return (null == node.Left && null == node.Right);
  }

This extension method is convenient to determine whether or not a given node is actually a leafnode. A leaf is a node which has no childnodes left and thus the end of a binary tree (or better a branch of that tree).

Now the interesting part, how do I make things work here. I have build a Windows Forms application having 3 textboxes. One for the actual input, one for the binary (encoded) output and the last for showing the compressed result. I also placed two simple buttons, one to perform the Huffman encoding and one for the Huffman decoding.

The Huffman encoding method is written as following (just in the eventhandler of the encode button):

 string input = tbInput.Text;
 Tree.BuildTree(input); //Build the huffman tree

 BitArray encoded = Tree.Encode(input); //Encode the tree

 //First show the generated binary output
 tbBinaryOutput.Text = string.Join(string.Empty, encoded.Cast<bool>().Select(bit => bit ? "1" : "0"));

  //Next, convert the binary output to the new characterized output string.       
  byte[] bytes = new byte[(encoded.Length / 8) + 1];
  encoded.CopyTo(bytes, 0);

  tbOutput.Text = Encoding.Default.GetString(bytes); //Write the compressed output to the textbox.

Note that the encoded binary string does not have any trailing bits. I'll leave that up to the Encoding mechanisms of C#. The downside of this, is that I have to keep track of it when decoding.

The decoding is not too hard now as well. Although, for this example, I am making use of the compressed output generated by the encoding code placed above. Also, I am assuming that the Huffman tree (and it's frequency table!!!) are already built. Normally, the frequency table is stored within the compressed file, so that it can be rebuild.

 //First convert the compressed output to a bit array again again and skip trailing bits.            
 bool[] boolAr = new BitArray(Encoding.Default.GetBytes(tbOutput.Text)).Cast<bool>().Take(Tree.BitCountForTree).ToArray();
 BitArray encoded = new BitArray( boolAr );

 string decoded = Tree.Decode(encoded);
 MessageBox.Show(decoded, "Decoded result: ", MessageBoxButtons.OK, MessageBoxIcon.Information);

Please pay attention to the dirty hack I created, as the Encoding.Default.GetBytes(tbOutput.Text) surely generates a byte array, it may contain trailing bits which need not to be decoded. Hence that I only take the amount of bits that I will actually need, based upon the rebuild tree.

So when running, my example provides the following output, when using the 'world renown sentence' "The quick brown fox jumps over the lazy programmer":

After pressing the "Huff encode" button:

HuffEncode

And after pressing the "Huff decode" button:

HuffDecode

Now this code can really use some optimizations, as you might consider using Arrays instead of Dictionaries. There are more, but it's up to you for consideration.

查看更多
登录 后发表回答