Probable bug in iTextSharp parser re inline image

It looks like the inline image parser can 'lose it's place' if it happens to encounters an inline image whose first byte of image data is a whitespace character. The code will work correctly with any other first byte value. I encountered this problem during text parsing.

In develop branch (4e7d3be18f70cc7516d62266ba1508b53219f227) and release 5.5.11 in InlineImageUtils.cs, ParseInlineImageSamples calls ParseInlineImageDictionary. ParseInlineImageDictionary consumes the whitespace following the 'ID' tag. Absence of whitespace will cause an IOException ("Unexpected character " + ch + " found after ID in inline image") to be thrown.

However, subsequent code that is executed in ParseUnfilteredSamples is trying to handle the whitespace again. In fact the comment indicates the code intends to handle the malformed absence of whitespace between he 'ID' tag and the image data. As stated above, I believe an exception would have been thrown already with such a malformed inline image. Regardless the parser is pointing at the first character of image data at this point.

If you are lucky, and the first byte of the image data is either 0 or non-whitespace, then the function will happen to perform correctly. If the first byte of image data is whitespace, then the first byte of the image data is lost and you read 1 byte past the 'EI' tag, finding just the 'I' and then throwing exception InlineImageParseException("EI not found after end of image data").

The fix which works for me so far is to remove

int shouldBeWhiteSpace = tokeniser.Read(); // skip next character (which better be a whitespace character - I suppose we could check for this)

and also remove

if (!PRTokeniser.IsWhitespace(shouldBeWhiteSpace) || shouldBeWhiteSpace == 0){ // tokeniser treats 0 as whitespace, but for our purposes, we shouldn't)
    bytes[0] = (byte)shouldBeWhiteSpace;
    startIndex++;
}

I encountered the problem when parsing a pdf with decimal 12 (Form Feed) as first character of inline image data.

Might explain itext-gettextfrompage-exception-with-inline-image

Note that this bug was encountered during method iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage using a SimpleTextExtractionStrategy. I was not attempting to process any images per se.

I am unable to run nunit tests at the moment so cannot fix myself at this point in time.