Hello I am trying to implement Canonical huffman encoding but i dont understand wiki and google guides, I need explain more abstractly...
I tried this: 1. Get list of regular huffman encoding length's codes. like this:
A - code: 110, length: 3.
B - code: 111, length: 3.
C - code: 10, length 2.
D - code: 01, length 2.
E - code: 00, length 2.
- I sorting the table by symbol and length like this:
C - code: 10, length 2. D - code: 01, length 2. E - code: 00, length 2. A - code: 110, length: 3. B - code: 111, length: 3.
now i dont know how to proceed...
tnx a lot
Throw out the codes you get from the Huffman algorithm. You don't need those. Just keep the lengths.
Now assign the codes based on the lengths and the symbols. Sort by length, from shortest to longest, and within each length, sort the symbols in ascending order. (How you do that exactly doesn't matter, so long as every symbol is strictly less than or greater than any other symbol, and the encoder and decoder agree on how to do it.)
So we do the ordering:
Two's come before three's, and within the 2's, C, D, E are in order, and within the 3's, A, B are in order.
Now we assign the code in integer order within each length, adding a zero bit at the end each time we go up a length:
That is a canonical code.
You could do it other ways if you like and still be canonical, e.g. counting backwards from 11, so long as the encoder and decoder agree on the approach. The whole point is to only have to transmit the lengths for each symbol from the encoder to the decoder, so as to not have to transmit the codes themselves which take more space.
You should sort symbols by there frequency, so most often would be on top and least often would be on bottom. (Overall frequency - 1):
Then mark one symbol with
0
and other with1
, summ there frequencies and insert into proper position in list and again mark two least with0
and1
:And again...
Until you obtain last pair. The path, marked by
0
and1
from tail to symbol would be corresponding Huffman code: