I've written a simple Huffman encoding in Ruby. As output I've got an array, for example:
["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]
I need to write, and then read, it to and from a file. I tried several methods:
IO.binwrite("out.cake", array)
I get a simple text file and not binary.
Or:
File.open("out.cake", 'wb' ) do |output|
array.each do | byte |
output.print byte.chr
end
end
Which looks like it works, but then I can't read it into array.
Which encoding should I use?
I think you can just use
Array#pack
andString#unpack
like the following code:I don't know your preferred format for the result of reading and I know the above method is inefficient. But anyway you can take "0" or "1" sequentially from the result of
unpack
to traverse your Huffman tree.If you want bits, then you have to do both packing and unpacking manually. Neither Ruby nor any other common-use language will do it for you.
Your array contains strings that are groups of characters, but you need to build an array of bytes and write those bytes into the file.
From this:
["010", "1111", "10", "10", "110", "1110", "001", "110", "000", "10", "011"]
you should build these bytes:
01011111 01011011 10001110 00010011
Since it's just four bytes, you can put them into a single 32-bit number
01011111010110111000111000010011
that is5F5B8E13
hex.Both samples of your code do different things. The first one writes into the file a string representation of a Ruby array. The second one writes 32 bytes where each is either
48
('0') or49
('1').If you want bits, then your output file size should be just four bytes.
Read about bit operations to learn how to achieve that.
Here is a draft. I didn't test it. Something may be wrong.
Note: seven zeros are added to handle case when the total number of characters is not divisible by 8. Without those zeros,
bit_sequence.scan(/.{8}/)
will drop the remaining characters.