Is there a text string variable type in Adobe PDF

2019-08-29 23:03发布

In the below example (from gnupdf.org/Introduction_to_PDF; also related: How to generate plain-text source-code PDF examples that work in a document viewer?), text is written verbatim using:

(Hello, world!) Tj

Is there a way I could store this "Hello, world!" in a variable (dictionary?), say /MyStringVar, and then output it multiple places using something like:

(/MyStringVar) Tj

(I've tried the above, couldn't get it to work; /MyStringVar is interpreted verbatim)

Here is the code, hello.pdf:

%PDF-1.4

1 0 obj  % entry point
<<
  /Type /Catalog
  /Pages 2 0 R
>>
endobj

2 0 obj
<<
  /Type /Pages
  /MediaBox [ 0 0 200 200 ]
  /Count 1
  /Kids [ 3 0 R ]
>>
endobj

3 0 obj
<<
  /Type /Page
  /Parent 2 0 R
  /Resources <<
    /Font <<
      /F1 4 0 R
    >>
  >>
  /Contents 5 0 R
>>
endobj

4 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /BaseFont /Times-Roman
>>
endobj

5 0 obj  % page content
<<
  /Length 44
>>
stream
BT
70 50 TD
/F1 12 Tf
(Hello, world!) Tj
ET
endstream
endobj

xref
0 6
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
trailer
<<
  /Size 6
  /Root 1 0 R
>>
startxref
492
%%EOF

2条回答
男人必须洒脱
2楼-- · 2019-08-29 23:35

As an addendum to @Frank's answer:

Deviations

There are some deviations from the PDF specification in the PDF in the answer.

  • in the page content stream (object 5) the XObject A is drawn from within a text object:

    BT
    70 50 TD    % this has no effect on `/A Do` - only on the "manual" `Tj`
    /A Do       % do the drawing of XObject A
    

    This is not allowed, cf. section 8.2, especially figure 9 at its end: XObjects may only be inserted at the page description level of the content of a page or XObject.

  • in the XObject content stream (object 6) a font is referenced

    /F1 12 Tf
    

    but no font resources are defined:

    /Resources << /ProcSet [ /PDF ] >>
    

    This is not allowed, The Tf operator shall specify the name of a font resource—that is, an entry in the Font subdictionary of the current resource dictionary (section 9.2.2 of the specification) which here is the resource dictionary of the XObject, not the page.

    In very early versions of the PDF format a XObject could inherit resources of the page if it omitted the Resources entry... This construct is obsolete and should not be used by conforming writers (section 7.8.3 of the PDF specification) and in the example at hand, the Resources entry is not even omitted after all.

  • in the XObject content stream (object 6) the text showing operator Tj is used outside a text object:

    stream
      %70 50 TD     % without this `TD` setting, `/A Do` places this in 0,0 - bottom left corner
      /F1 12 Tf
      (Hello, world!) Tj
    endstream
    

    This is not allowed, cf. section 8.2, especially figure 9 at its end: Text showing operators are only allowed in text objects, and as XObject shall not be used inside text objects, this stream cannot be considered to reside in one.

As it displays the XObject nonetheless, evince seems to be quite forgiving concerning PDF validity issues, even more forgiving than the Adobe Reader which already is very forgiving but shows that PDF as:

hello.pdf as displayed by Acrobat Reader

i.e. it does not display the XObject at all.

Adapted sample

This section contains an adapted sample which is nearer to the specification.

Furthermore the wish of the OP to position the XObject more freely is taken into account:

%PDF-1.4

1 0 obj  % entry point
<<
  /Type /Catalog
  /Pages 2 0 R
>>
endobj

2 0 obj
<<
  /Type /Pages
  /MediaBox [ 0 0 200 200 ]
  /Count 1
  /Kids [ 3 0 R ]
>>
endobj

3 0 obj
<<
  /Type /Page
  /Parent 2 0 R
  /Resources <<
    /XObject  <<
        /A 6 0 R
    >>
  >>
  /Contents 5 0 R
>>
endobj

4 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /BaseFont /Times-Roman
>>
endobj

5 0 obj  % page content
<<
  /Length 588
>>
stream
 % draw xobject at 0, 0
 /A Do
 % draw xobject at 20, 180
 q
  1 0 0 1 20 180 cm
  /A Do
 Q
 % draw xobject at 100, 100, with different scales and rotations applied
 q
  1 0 0 1 100 100 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
  0.7 0.5 -0.5 0.7 0 0 cm
  /A Do
 Q
 % draw xobject at 120, 180, skewed somewhat
 q
  1 0 0.3 1 120 180 cm
  /A Do
 Q
endstream
endobj

6 0 obj
  << /Type /XObject
     /Subtype /Form
     /FormType 1
     /BBox [ 0 0 1000 1000 ]
     /Matrix [ 1 0 0 1 0 0 ]
     /Resources <<
        /ProcSet [ /PDF ]
        /Font <<
          /F1 4 0 R
     >>
  >>
     /Length 130
  >>
stream
 BT
  /F1 12 Tf
  % To not cut off stuff below the base line, namely parts of the comma
  1 0 0 1 0 3 Tm
  (Hello, world!) Tj
 ET
endstream
endobj

xref
0 7
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
0000000450 00000 n
trailer
<<
  /Size 7
  /Root 1 0 R
>>
startxref
600
%%EOF

(Cross reference entries and stream lengths surely are wrong.)

This results (as seen in Adobe Reader):

Screenshot of adapted sample

All the "Hello, world!" instances are generated using the single XObject of the PDF.

查看更多
叼着烟拽天下
3楼-- · 2019-08-29 23:50

The PDF does not have something like a variable like PostScript does. What may come close to what you are trying to achieve (output the same text multiple places) is a form XObject. Just like a page it has a content stream with graphics objects such as (Hello, world!) Tj, and it can be be drawn on a page (or another XObject) through the graphics Do operator. Its operand corresponds to a key in the XObject dictionary in the Resources dictionary of the page. The PDF would look something like this. (Note that stream lengths, the cross references table and the trailer or no longer valid so consider this pseudo-PDF.)

%PDF-1.4

1 0 obj  % entry point
<<
  /Type /Catalog
  /Pages 2 0 R
>>
endobj

2 0 obj
<<
  /Type /Pages
  /MediaBox [ 0 0 200 200 ]
  /Count 1
  /Kids [ 3 0 R ]
>>
endobj

3 0 obj
<<
  /Type /Page
  /Parent 2 0 R
  /Resources <<
    /Font <<
          /F1 4 0 R
    >>
    /XObject  <<
              /A 6 0 R  % XObject /A is obj 6 0
    >>
  >>                    % /Resources must close here
  /Contents 5 0 R
>>
endobj

4 0 obj
<<
  /Type /Font
  /Subtype /Type1
  /BaseFont /Times-Roman
>>
endobj

5 0 obj  % page content
<<
  /Length 44
>>
stream
BT
70 50 TD    % this has no effect on `/A Do` - only on the "manual" `Tj`
/A Do       % do the drawing of XObject A
/F1 12 Tf   % without this line: "Error: No font in show;"
% if without TD, then the next text is just appended
%-10 50 TD
0 0 TD      % "Td/TD move to the start of next line"; but here like \r
(Hello, world - manual!) Tj
ET
endstream
endobj

6 0 obj
  << /Type /XObject
     /Subtype /Form
     /FormType 1
     /BBox [ 0 0 1000 1000 ]
     /Matrix [ 1 0 0 1 0 0 ]
     /Resources << /ProcSet [ /PDF ] >>
     /Length 58
  >>
stream
  %70 50 TD     % without this `TD` setting, `/A Do` places this in 0,0 - bottom left corner
  /F1 12 Tf
  (Hello, world!) Tj
endstream
endobj

xref
0 7
0000000000 65535 f
0000000010 00000 n
0000000079 00000 n
0000000173 00000 n
0000000301 00000 n
0000000380 00000 n
0000000450 00000 n
trailer
<<
  /Size 7
  /Root 1 0 R
>>
startxref
600
%%EOF

Output in evince:

hello-evince.png

EDIT The text in the form XObject appears at the lower left corner because the current transformation matrix equals the identity matrix at the time of the show string operation. The initial CTM of the form XObject equals the concatenation of [the CTM in the parent stream when Do is invoked] and [the Matrix entry in the form XObject dictionary]. Which is identity in this case. The text matrix is not propagated from the parent stream to the form XObject.

查看更多
登录 后发表回答