The current project in Systems Programming is to come up with an ASCII compressor that removes the top zero bit and writes the contents to the file.
In order to facilitate decompression, the original file size is written to file, then the compressed char bytes. There are two files to run tests on- one that is 63 bytes long, and the other is 5344213 bytes. My code below works as expected for the first test file, as it writes 56 bytes of compressed text plus 4 bytes of file header.
However, when I try it on the long test file, the compressed version is 3 bytes shorter than the original, when it should be roughly 749KiB smaller, or 14% of original size. I've worked out the binary bit shift values for the first two write loops of the long test file, and they match up what is being recorded on my test printout.
while ( (characters= read(openReadFile, unpacked, BUFFER)) >0 ){
unsigned char packed[7]; //compression storage
int i, j, k, writeCount, endLength, endLoop;
//loop through the buffer array
for (i=0; i< characters-1; i++){
j= i%7;
//fill up the compressed array
packed[j]= packer(unpacked[i], unpacked[i+1], j);
if (j == 6){
writeCalls++; //track how many calls made
writeCount= write(openWriteFile, packed, sizeof (packed));
int packedSize= writeCount;
for (k=0; k<7 && writeCalls < 10; k++)
printf("%X ", (int)packed[k]);
totalWrittenBytes+= packedSize;
printf(" %d\n", packedSize);
memset(&packed[0], 0, sizeof(packed)); //clear array
if (writeCount < 0)
printOpenErrors(writeCount);
}
//end of buffer array loop
endLength= characters-i;
if (endLength < 7){
for (endLoop=0; endLoop < endLength-1; endLoop++){
packed[endLoop]= packer(unpacked[endLoop], unpacked[endLoop+1], endLoop);
}
packed[endLength]= calcEndBits(endLength, unpacked[endLength]);
}
} //end buffer array loop
} //end file read loop
The packer function:
//calculates the compressed byte value for the array
char packer(char i, char j, int k){
char packStyle;
switch(k){
//shift bits based on mod value with 8
case 0:
packStyle= ((i & 0x7F) << 1) | ((j & 0x40) >> 6);
break;
case 1:
packStyle= ((i & 0x3F) << 2) | ((j & 0x60) >> 5);
break;
case 2:
packStyle= ((i & 0x1F) << 3) | ((j & 0x70) >> 4);
break;
case 3:
packStyle= ((i & 0x0F) << 4) | ((j & 0x78) >> 3);
break;
case 4:
packStyle= ((i & 0x07) << 5) | ((j & 0x7C) >> 2);
break;
case 5:
packStyle= ((i & 0x03) << 6) | ((j & 0x7E) >> 1);
break;
case 6:
packStyle= ( (i & 0x01 << 7) | (j & 0x7F));
break;
}
return packStyle;
}
I've verified that there are 7 bytes written out every time the packed buffer is flushed, and there are 763458 write calls made for the long file, which match up to 5344206 bytes written.
I'm getting the same hex codes from the printout that I worked out in binary beforehand, and I can see the top bit of every byte removed. So why aren't the bit shifts being reflected in the results?