ASCII compressor works for short test file, not on

2019-04-28 17:57发布

问题:

The current project in Systems Programming is to come up with an ASCII compressor that removes the top zero bit and writes the contents to the file.

In order to facilitate decompression, the original file size is written to file, then the compressed char bytes. There are two files to run tests on- one that is 63 bytes long, and the other is 5344213 bytes. My code below works as expected for the first test file, as it writes 56 bytes of compressed text plus 4 bytes of file header.

However, when I try it on the long test file, the compressed version is 3 bytes shorter than the original, when it should be roughly 749KiB smaller, or 14% of original size. I've worked out the binary bit shift values for the first two write loops of the long test file, and they match up what is being recorded on my test printout.

while ( (characters= read(openReadFile, unpacked, BUFFER)) >0 ){
   unsigned char packed[7]; //compression storage
   int i, j, k, writeCount, endLength, endLoop;

    //loop through the buffer array
    for (i=0; i< characters-1; i++){
        j= i%7; 

        //fill up the compressed array
        packed[j]= packer(unpacked[i], unpacked[i+1], j);

        if (j == 6){
            writeCalls++; //track how many calls made

            writeCount= write(openWriteFile, packed, sizeof (packed));
            int packedSize= writeCount;
            for (k=0; k<7 && writeCalls < 10; k++)
                printf("%X ", (int)packed[k]);      

            totalWrittenBytes+= packedSize;
            printf(" %d\n", packedSize);
            memset(&packed[0], 0, sizeof(packed)); //clear array

            if (writeCount < 0)
                printOpenErrors(writeCount);
        }
        //end of buffer array loop
        endLength= characters-i;
        if (endLength < 7){

            for (endLoop=0; endLoop < endLength-1; endLoop++){
                packed[endLoop]= packer(unpacked[endLoop], unpacked[endLoop+1], endLoop);
            }

            packed[endLength]= calcEndBits(endLength, unpacked[endLength]);
        }
    } //end buffer array loop
} //end file read loop

The packer function:

//calculates the compressed byte value for the array
char packer(char i, char j, int k){
    char packStyle;

    switch(k){
        //shift bits based on mod value with 8
        case 0:
                packStyle= ((i & 0x7F) << 1) | ((j & 0x40) >> 6);
            break;
        case 1:
            packStyle= ((i & 0x3F) << 2) | ((j & 0x60) >> 5);
            break;
        case 2:
            packStyle= ((i & 0x1F) << 3) | ((j & 0x70) >> 4);
            break;
        case 3:
            packStyle= ((i & 0x0F) << 4) | ((j & 0x78) >> 3);
            break;
        case 4:
            packStyle= ((i & 0x07) << 5) | ((j & 0x7C) >> 2);
            break;
        case 5:
            packStyle= ((i & 0x03) << 6) | ((j & 0x7E) >> 1);
            break;
        case 6:
            packStyle= ( (i & 0x01 << 7) | (j & 0x7F));
            break;
    }

    return packStyle;
}

I've verified that there are 7 bytes written out every time the packed buffer is flushed, and there are 763458 write calls made for the long file, which match up to 5344206 bytes written.

I'm getting the same hex codes from the printout that I worked out in binary beforehand, and I can see the top bit of every byte removed. So why aren't the bit shifts being reflected in the results?

回答1:

Ok, since this is homework I'll just give you a few hints without giving out a solution.

First are you sure that the 56 bytes you get on the first file are the right bytes? Sure the count looks good, but you got lucky on count (proof is the second test file). I can immediately see at least two key mistakes in the code.

To make sure you have the right output, the byte count is not enough. You need to dig deeper. How about checking the bytes themselves one by one. 63 characters is not that much to go heh? There are many ways you can do this. You could use od (a pretty good Linux/Unix tool to look at the binary contents of files, if you're on Windows use some Hex editor). Or you could print out debug information from within your program.

Good luck.



回答2:

Why do you expect the output to be 14% shorter than the input? How could it, when you store a byte into packed as many times as there are input bytes, except for the last group? The size of the output will always be within 7 of the size of the input.