Algorithm issue with TIFF CCITT Group 4 decompress

2019-08-03 15:56发布

问题:

I work for an engineering design house and we store black and white design drawings in TIFF format compressed with CCITT Group 4 compression.

I am working on a project to improve our software for working with these drawings. I need to be able to load the raw data into my program obviously, so I must decompress it.

I tried using LibTiff but gave up on that rather quickly. It wouldn't build, generating over 2000 errors. I found many obvious syntax errors in the library and concluded it was junk. I spent about 3 hours trying to find the part of the library that implements the CCITT Group 4 codec but no luck, that code is an incomprehensible mess.

So it is that I am writing my own codec for the program. I have it mostly working well, but I am stuck on a problem. I cannot find good documentation on this format. There are a lot of good overviews that describe generally how 2D Modified Huffman compression works, but I cant find any that have specific, implementation level details. So I am trying to work it out by using some of the drawing files as examples.

I have vertical and pass modes working well and my algorithm decompresses about a third of the image properly before it goes off to the wizard and produces garbage.

I traced the problem to the horizontal mode. My algorithm for the horizontal mode expects to see the horizontal mode code 001 followed by a set of makeup codes (optional) and a termination code in the the current pen color, followed by another set of makeup codes (optional) and a termination code in the opposite color.

This algorithm worked well for a third of the way through the image, but suddenly I encountered a horizontal mode run where the opposite color comes before the current pen color.

The section of the image is a run of 12 black pixels followed by a run of 22 white pixels.
The code bits from that section are 00100000110000111 which decodes to Horizontal (001) 22 White (0000011) 12 Black (0000111 ) which as you can see is opposite of the order in which the pixels appear in the image.

Since my algorithm expects image order listing, it crashes. But the previous 307 instances of horizontal mode in this same image file were all in image order. This is the only reversed one I have found (so far).

Other imaging programs display this file just fine. I tried manually editing the bits in the image file just as a test to put the order in image order and that causes other imaging programs to crash when decoding the image. This leads me to believe they have some way of knowing that it is reversed in that instance.

Anyone know specific implementation level details about this TIFF CCITT G4 encoding which could help me understand how and why the run codes are sometimes reversed?

Thanks Josh

回答1:

CCITT G4 horizontal codes are always encoded as a pair (black/white) or (white/black). It depends on the current pen color. A vertical code will flip the color, but a horizontal code will leave the color unchanged. If the current pen color is black, then you decode a white horizontal code followed by a black. If the current pen color is white, then you will do the opposite.



回答2:

Code : 00100000110000111

001 : Horizontal Mode

0000011000 : Black RunLength 17

0111 : White RunLength 2

It is Black first.

Run codes are not reversed.