How to generate plain-text source-code PDF example

2019-01-25 23:05发布

I just found the post Adobe Forums: Simple Text String Example in specification broken., so I got interested in finding plain-text source code PDF examples.

So, through that post, I eventually found:

The PDF 1.7 spec has on page 699 appendix "_Annex H (informative) Example PDF files"; and from there, I wanted to try "H.3 Simple Text String Example" (the "classic Hello World").

So I tried to save this as hello.pdf (_except note when you copy from the PDF32000_2008.pdf, you may get "%PDF-1. 4" - that is, a space inserted after 1., which must be removed_) :

%PDF-1.4
1 0 obj
  << /Type /Catalog
      /Outlines 2 0 R
      /Pages 3 0 R
  >>
endobj

2 0 obj
  << /Type /Outlines
      /Count 0
  >>
endobj

3 0 obj
  << /Type /Pages
      /Kids [ 4 0 R ]
      /Count 1
  >>
endobj

4 0 obj
  << /Type /Page
      /Parent 3 0 R
      /MediaBox [ 0 0 612 792 ]
      /Contents 5 0 R
      /Resources << /ProcSet 6 0 R
      /Font << /F1 7 0 R >>
  >>
>>
endobj

5 0 obj
  << /Length 73 >>
stream
  BT
    /F1 24 Tf
    100 100 Td
    ( Hello World ) Tj
  ET
endstream
endobj

... and I'm trying to open it:

evince hello.pdf

... however, evince cannot open it: "Unable to open document / PDF document is damaged"; and also:

Error: PDF file is damaged - attempting to reconstruct xref table...
Error: Couldn't find trailer dictionary
Error: Couldn't read xref table

I also check with qpdf:

$ qpdf --check hello.pdf
WARNING: hello.pdf: file is damaged
WARNING: hello.pdf: can't find startxref
WARNING: hello.pdf: Attempting to reconstruct cross-reference table
hello.pdf: unable to find trailer dictionary while recovering damaged file

Where am I going wrong with this?

Many thanks in advance for any answers,
Cheers!

2条回答
放荡不羁爱自由
2楼-- · 2019-01-25 23:20

You should append a (syntactically correct) xref and trailer section to the end of the file. That means: each object in your PDF needs one line in the xref table, even if the byte offset isn't correctly stated. Then Ghostscript, pdftk or qpdf can re-establish a correct xref and render the file:

[...]
endobj
xref 
0 8 
0000000000 65535 f 
0000000010 00000 n 
0000000020 00000 n 
0000000030 00000 n 
0000000040 00000 n 
0000000050 00000 n 
0000000060 00000 n 
0000000070 00000 n 
trailer 
<</Size 8/Root 1 0 R>> 
startxref 
555 
%%EOF 
查看更多
老娘就宠你
3楼-- · 2019-01-25 23:42

Ah damn it - I had copied just a part of the code; the OP code is the one on pg 701 - then there is a footer which confused me; otherwise the code continues on pg 702 :/

(EDIT: also see Introduction to PDF - GNUpdf (archive) for a similar, more detailed example)

So here is the complete code:

%PDF-1.4
1 0 obj
  << /Type /Catalog
      /Outlines 2 0 R
      /Pages 3 0 R
  >>
endobj

2 0 obj
  << /Type /Outlines
      /Count 0
  >>
endobj

3 0 obj
  << /Type /Pages
      /Kids [ 4 0 R ]
      /Count 1
  >>
endobj

4 0 obj
  << /Type /Page
      /Parent 3 0 R
      /MediaBox [ 0 0 612 792 ]
      /Contents 5 0 R
      /Resources << /ProcSet 6 0 R
      /Font << /F1 7 0 R >>
  >>
>>
endobj

5 0 obj
  << /Length 73 >>
stream
  BT
    /F1 24 Tf
    100 100 Td
    ( Hello World ) Tj
  ET
endstream
endobj

6 0 obj
  [ /PDF /Text ]
endobj

7 0 obj
  << /Type /Font
    /Subtype /Type1
    /Name /F1
    /BaseFont /Helvetica
    /Encoding /MacRomanEncoding
  >>
endobj

xref
0 8
0000000000 65535 f
0000000009 00000 n
0000000074 00000 n
0000000120 00000 n
0000000179 00000 n
0000000364 00000 n
0000000466 00000 n
0000000496 00000 n

trailer
  << /Size 8
    /Root 1 0 R
  >>
startxref
625
%%EOF

Indeed, as the error messages were saying, xref section was missing!

However, this is still not the end - while this document will open in evince, evince will still complain:

$ evince hello.pdf 
Error: PDF file is damaged - attempting to reconstruct xref table...

... and so will qpdf:

$ qpdf --check hello.pdf
WARNING: hello.pdf: file is damaged
WARNING: hello.pdf (file position 625): xref not found
WARNING: hello.pdf: Attempting to reconstruct cross-reference table
checking hello.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
WARNING: hello.pdf (object 5 0, file position 436): attempting to recover stream length

So to actually get a proper example, as the Adobe Forums: Simple Text String Example in specification broken. points out, xref table needs to be reconstructed (have correct byte offsets).

And in order to do this, we can use pdftk to "Repair a PDF's Corrupted XREF Table and Stream Lengths (If Possible)":

$ pdftk hello.pdf output hello_repair.pdf

... and now hello_repair.pdf opens in evince without a problem - and qpdf reports:

$ qpdf --check hello_repair.pdf
checking hello_repair.pdf
PDF Version: 1.4
File is not encrypted
File is not linearized
No errors found

Well, hope this helps someone,
Cheers!

查看更多
登录 后发表回答