problem in saving Huffman Code?

2019-05-23 09:56发布

问题:

I want to save Huffman codes into a file. How can I do this? I am saving Huffman codes into a string but size of generated file is bigger than Original file.

回答1:

A very simple approach is to write one bit at a time with something like the following:

unsigned char acc; // Accumulator of bit waiting to be written
int bitcount;      // How many bits are aready present in the accumulator

// write a single bit (0/1)
void writebit(int bit)
{
    acc |= (bit << bitcount);
    if (++bitcount == 8)
    {
        writebyte(acc);
        acc = 0;
        bitcount = 0;
    }
}

to read back a sigle bit the procedure is symmetrical

unsigned char acc;   // bits waiting to be extracted
int bitcount;        // how many bits are still available in acc

int readbit()
{
   if (bitcount == 0)
   {
       bitcount = 8;
       acc = readbyte();
   }
   --bitcount;
   return (acc >> (7 - bitcount)) & 1;
}

of course this is just the simplest approach, but I'd wait before worrying about code speed until you are first able to save and load correctly your encoded data.

Example:

Suppose you have the following Huffman coded symbols

A - 0
B - 10
C - 110
D - 111

and that you want to encode the sequence

A B A A C D A D B B

then you would call in order

writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(0);                           // A
writebit(0);                           // A
writebit(1); writebit(1); writebit(0); // C
writebit(1); writebit(1); writebit(1); // D
writebit(0);                           // A
writebit(1); writebit(0);              // B
writebit(1); writebit(0);              // B

The actual bytes written would therefore be

(01100010) = 0x62
(01010111) = 0x57

(Note that the code shown starts from the least significant bit, i.e. you should read the bit sequences inside the parenthesis from right to left if you want to recognize the symbols).



回答2:

The answer to Efficient way of storing Huffman tree should help you.



回答3:

I believe what you are saving is a string of 1's and 0's. A true huffman code needs to be saved in binary and then parsed later on. If you are merely saving the output as a string you are defeating the purpose of a huffman code, each 0 and 1 is 8bits instead of 1.



回答4:

You are probably saving the entire byte for each pattern/letter.

Let's say e is the most common letter. It will have a bit pattern of 0.

Let's say z is the least common letter it will have some pattern starting with 1. Let's just assign it to be 1111 111.

The file you want to be writing will be this:

0111 1111

You are PROBABLY writing this:

0000 0000 0111 1111.

You need to take advantage of bitwise operations to do this.