Error in PDF root object

2019-05-07 16:19发布

This PDF root object will get Adobe Reader to fail. Other PDF readers like Foxit, Nuance, Evince, SumatraPDF will open the PDF file without problems. The problem is /Dests which reguires an indirect object (PDF reference). Deleting the /Dests << >> will get Adobe Reader to open the file, but fail on printing. All the other readers work OK without the /Dests. Any ideas how to correct the syntax in the following root object example?

  17 0 obj
  <<
    /Type /Catalog
    /Pages 2 0 R
    /Outlines 15 0 R
    /PageMode /UseOutlines
    /Dests <<
             /__WKANCHOR_2 8 0 R
             /#8d#c2#ca#ebs#e4#60#00#9e#97l#b9#80#1b#cb#86sQR#83 9 0 R
           >>
  >>
  endobj

3条回答
劫难
2楼-- · 2019-05-07 16:44

/Dests is supposed to be a dictionary (pairs of /Key value) containing names (Keys) and corresponding destinations (values). The /Dests keyword first appeared in PDF 1.1.

PDF 1.1 allowed for the keys only to be a name object. PDF 1.2 allowed for keys to also be byte strings.

So which PDF version does your file claim to be?

From the spec for PDF 1.7 ("ISO 32000-1"), describing the meaning of /Dests:

In PDF 1.1, the correspondence between name objects and destinations shall be defined by the Dests entry in the document catalogue (see 7.7.2, “Document Catalog”). The value of this entry shall be a dictionary in which each key is a destination name and the corresponding value is either an array defining the destination, using the syntax shown in Table 151, or a dictionary with a D entry whose value is such an array.

查看更多
我想做一个坏孩纸
3楼-- · 2019-05-07 16:50

OK, found a few spare minutes...

So the first thing I noticed is that *all other readers indeed may open the file (I only tested a few). But these do spit out lots and lots of warnings and error messages... (Try Ghostscript: gs virkerikke.pdf, or try evince...) There is at least a damaged xref table in the PDF as well (or at least this is one of the complaints).

xpdf complains:

[....]
Error: Invalid XRef entry
Error: Invalid XRef entry
Error: Invalid XRef entry
Error (157): Unterminated string
Error (159): End of file inside dictionary

gv complains:

Warning: translation table syntax error: Unknown keysym name:  apLineDel
Warning: ... found while parsing '<Key>apLineDel:   GV_Page(page+5)     '
Warning: String to TranslationTable conversion encountered errors

evince complains:

[....]
Error: Invalid XRef entry
Error: Invalid XRef entry
Error: Invalid XRef entry
Error (157): Unterminated string
Error (159): End of file inside dictionary
Error (157): Unterminated string
Error (159): End of file inside dictionary
Error (157): Unterminated string
Error (159): End of file inside dictionary
[....]
Error (1918): Unterminated string
Error (1920): End of file inside dictionary

gs complains:

**** Warning: File has a corrupted %%EOF marker, or garbage after %%EOF.

mupdf complains:

+ pdf/pdf_xref.c:60: pdf_read_start_xref(): cannot find startxref
| pdf/pdf_xref.c:477: pdf_load_xref(): cannot read startxref
\ pdf/pdf_xref.c:532: pdf_open_xref_with_stream(): trying to repair
warning: ignoring invalid character in hex string: '!'
warning: ignoring invalid character in hex string: 'O'
warning: ignoring invalid character in hex string: 'T'
warning: ignoring invalid character in hex string: 'Y'
[....]

qpdf --qdf complains:

virkerikke.pdf (object 17 0, file position 2234): null character not allowed in name token

OK, now opening this crappy file in a text editor, trying to repair it. What I find is this that this file (32746 Bytes in size) has some serious syntax problems:

  1. Garbage after %%EOF: There is a complete and syntax-correct HTML-File glued to the PDF after its %%EOF marker with the title "Wkhtmltopdf - Teknisk regelverk". Its size is 11878 Bytes. Delete this part, and you'll have a 'better' PDF with a size of only 20868 Bytes left... though Acrobat/Adobe Reader still doesn't open it after you saved the edited file.
  2. Invalid character in name token: This is inside the name token /#8d#c2#ca#ebs#e4#60#00#9e#97l#b9#80#1b#cb#86sQR#83. It appears 2x in this file. Already in my first comments I told you that this key didn't look trustworthy to me, because it contains only very few ASCII characters, but lots of binary Bytes (using their hexadecimal representation. (What I had overlooked was that it even contained a #00 which is the PDF representation for a nul character... the use of which is illegal for name tokens in PDF.) Replace that name token with another (phantasy) one of exactly the same length (on both occurrences). I did choose /aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa. Save the edited file.

Now even Acrobat/Adobe Readers will open this repaired file without complaining. Also, the 'other readers' will work now better with this file, spitting out less warnings and now being able to identify some metadata (such as creation date and producer == wkhtmltopdf) which they were unable to get to for the original file.

查看更多
成全新的幸福
4楼-- · 2019-05-07 16:50

Seems pretty straightforward. Move the dests array into its own object.

Rather than

17 0 obj
<<
  /Type /Catalog
  /Pages 2 0 R
  /Outlines 15 0 R
  /PageMode /UseOutlines
  /Dests <<
    /__WKANCHOR_2 8 0 R
    /#8d#c2#ca#ebs#e4#60#00#9e#97l#b9#80#1b#cb#86sQR#83 9 0 R
  >>
>>
endobj

you should instead have:

17 0 obj
<<
  /Type /Catalog
  /Pages 2 0 R
  /Outlines 15 0 R
  /PageMode /UseOutlines
  /Dests 1234 0 R
>>
endobj
1234 0 obj
<</__WKANCHOR_2 8 0 R/#8d#c2#ca#ebs#e4#60#00#9e#97l#b9#80#1b#cb#86sQR#83 9 0 R>>
endobj

The object number is going to be something pseudorandom.

And how to move the dest array out of the root into its own object going to be entirely dependent on what PDF software you're using. "A Hex Editor" is an option, but then you're over on SuperUser instead of here on StackOverflow... technically. I suspect you might get a mulligan on that one. I'd let it slide myself.

查看更多
登录 后发表回答