I am able to compress all kinds of files (.jpg, .mp4 etc.) but when I try to decompress these non-text files the program just returns an empty decompressed file... the weird part is that I am able to decompress plain text files just fine.
When I compress my original file I put both the data needed to reconstruct the tree and the encoded bits in the same file. The format looks something like this:
<n><value 1><frequency 1>...<value n><frequency n>[the compressed bytes]
Where n is the total number of unique bytes (AKA number of leafs in my tree), value is just my leaf values in byte form and frequency is the frequency of each byte/"character" (frequency is an int value so it consists of 4 bytes per frequency).
BitFileReader and BitFileWriter in my code are just wrapper classes for a BufferedOutStream/InputStream with the added functionality of reading/writing bit by bit.
I added my whole Huffman code below but the main focus is the compress() and decompress() methods at the bottom. At the very least I want to know if my logic for these methods is fine, and if so what is causing it to return empty decompressed files when decompressing other file types (that aren't plain text files)?
Huffman Code:
public class HuffmanCode {
public static Tree buildTree(int[] charFreqs) {
PriorityQueue<Tree> trees = new PriorityQueue<Tree>();
for (int i = 0; i < charFreqs.length; i++){
if (charFreqs[i] > 0){
trees.offer(new Leaf(charFreqs[i], i));
}
}
//assert trees.size() > 0;
while (trees.size() > 1) {
Tree a = trees.poll();
Tree b = trees.poll();
trees.offer(new Node(a, b));
}
return trees.poll();
}
public static void printStruct(Tree tree) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + " " + leaf.frequency);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
printStruct(node.left);
// traverse right
printStruct(node.right);
}
}
public static void printStruct(Tree tree, StringBuffer prefix) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
System.out.println(leaf.value + "\t" + leaf.frequency + "\t" + prefix);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
printStruct(node.left, prefix);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
printStruct(node.right, prefix);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillEncodeMap(Tree tree, StringBuffer prefix, TreeMap<Integer, String> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(leaf.value, prefix.toString());
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillEncodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillEncodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void fillDecodeMap(Tree tree, StringBuffer prefix, TreeMap<String, Integer> treeMap) {
//assert tree != null;
if (tree instanceof Leaf) {
Leaf leaf = (Leaf)tree;
treeMap.put(prefix.toString(), leaf.value);
} else if (tree instanceof Node) {
Node node = (Node)tree;
// traverse left
prefix.append('0');
fillDecodeMap(node.left, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
// traverse right
prefix.append('1');
fillDecodeMap(node.right, prefix, treeMap);
prefix.deleteCharAt(prefix.length()-1);
}
}
public static void compress(File file){
try {
Path path = Paths.get(file.getAbsolutePath());
byte[] content = Files.readAllBytes(path);
TreeMap<Integer, String> encodeMap = new TreeMap<Integer, String>();
File nF = new File(file.getName()+"_comp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
int[] charFreqs = new int[256];
// read each byte and record the frequencies
for (byte b : content){
charFreqs[b&0xFF]++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillEncodeMap(tree, new StringBuffer(), encodeMap);
// Writes tree structure in binary form to nF (new file)
bfw.writeByte(encodeMap.size());
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] != 0){
ByteBuffer b = ByteBuffer.allocate(4);
b.putInt(charFreqs[i]);
byte[] result = b.array();
bfw.writeByte(i);
for(int j=0; j<4;j++){
bfw.writeByte(result[j]&0xFF);
}
}
}
// Write compressed data
for(byte b : content){
String code = encodeMap.get(b&0xFF);
for(char c : code.toCharArray()){
if(c == '1'){
bfw.write(1);
}
else{
bfw.write(0);
}
}
}
bfw.close();
System.out.println("Compression successful!");
} catch (IOException e) {
e.printStackTrace();
}
}
public static void decompress(File file){
try {
BitFileReader bfr = new BitFileReader(file);
int[] charFreqs = new int[256];
TreeMap<String, Integer> decodeMap = new TreeMap<String, Integer>();
File nF = new File(file.getName()+"_decomp");
nF.createNewFile();
BitFileWriter bfw = new BitFileWriter(nF);
DataInputStream data = new DataInputStream(new BufferedInputStream(new FileInputStream(file)));
int uniqueBytes;
int counter = 0;
int byteCount = 0;
uniqueBytes = data.readUnsignedByte();
// Read frequency table
while (counter < uniqueBytes){
int index = data.readUnsignedByte();
int freq = data.readInt();
charFreqs[index] = freq;
counter++;
}
// build tree
Tree tree = buildTree(charFreqs);
// build TreeMap
fillDecodeMap(tree, new StringBuffer(), decodeMap);
// Skip BitFileReader position to actual compressed code
bfr.skip(uniqueBytes*5);
// Get total number of compressed bytes
for(int i=0; i<charFreqs.length; i++){
if(charFreqs[i] > 0){
byteCount += charFreqs[i];
}
}
// Decompress data and write
counter = 0;
StringBuffer code = new StringBuffer();
while(bfr.hasNextBit() && counter < byteCount){
code.append(""+bfr.nextBit());
if(decodeMap.containsKey(code.toString())){
bfw.writeByte(decodeMap.get(code.toString()));
code.setLength(0);
counter++;
}
}
bfw.close();
bfr.close();
data.close();
System.out.println("Decompression successful!");
}
catch (IOException e) {
e.printStackTrace();
}
}
public static void main(String[] args) {
File f = new File("test");
compress(f);
f = new File("test_comp");
decompress(f);
}
}
EDIT: I found the cause but I don't know how to fix it or why it happens. The problem is that my charFreqs[] array in my decompression method never gets filled (all its values are still at zero AKA all bytes have zero frequency according to the array).