I want to add a text on an existing PDF using Rails, so I did :
filename = "#{Rails.root}/app/assets/images/sample.pdf"
Prawn::Document.generate("#{Rails.root}/app/assets/images/full_template.pdf", :template => filename) do
text "Test", :align => :center
end
And when I open full_template.pdf, I have my template PDF + my text "Test", but this text is written in a bad direction as if my text was written using a mirror.
You can find the two PDF documents here:
Original : http://www.sebfie.com/wp-content/uploads/sample.pdf
Generated : http://www.sebfie.com/wp-content/uploads/full_template.pdf
Let's see... [switching into PDF debugging mode].
First, I unpack your full_template.pdf with the help of
qpdf
, a command-line utility "that does structural, content-preserving transformations on PDF files" (self-description):The result, qdf---test.pdf is now more easy to analyse in a normal text editor, because all streams are unpacked.
Searching for the string "est" finds us this line:
Poking around a bit more (and looking at
qpdf
's very helpful comments sprinkled into its output!) we find this: the PDF object where your mirrored string "Test" appears in the original PDF is number 22. It is a completely separate object from the rest of the file's text, and it also is the only one that uses an un-embedded Helvetica font.So let's extract that separately from the original file:
OK, here the piece
[(T) 120 (est)] TJ
appears as[<54> 120 <657374>] TJ
. We verify this with the help of theascii
command, that prints us a nice ASCII <-> Hex table. That table confirms:What do the other operators mean? We look them up in the official ISO 32000 PDF-1.7 spec, Annex A, "Operator Summary". Here we find the following bits of info:
Nothing suspicious so far...
However, looking at the other object where the original page content appears in, object number 5, we discover a difference. For example:
Here, before each single action of a
Tj
(show text) theTm
operator (What is this?!?) is in play. Let's also look upTm
in the PDF spec:What is strange however, is that this matrix uses
1 0 0 -1
(instead of the more common1 0 0 1
). This leads to the up-side down mirroring of the text.Wait a minute!?!
The original text content is stroked with a mirroring text matrix, but still appears normal?? But your added text doesn't use any text matrix of its own, but appears mirrored? What is going on?!
I'm not going to trace it down for more now. My assumption is however, that somewhere in the guts of the original PDF, the authoring software defined an 'extended graphics state' which causes all stroking operations to be mirrored by default.
It seems you've done nothing wrong, Sebastien -- you've just been unlucky with your choice of a test object, and got blessed with a rather weird one. Try it continue your 'Prawn' experiments with some other PDFs first...
One can "fix" your full_template.pdf by replacing this line in qdf---test.pdf:
by this one:
and then run a last
qdf
command to fix the (now corrupted by our editing) PDF cross-reference table and stream lenghts:qpdf qdf---test.pdf full_template---fixed.pdf
The console output will show you want it does:
The "fixed" PDF will show the text un-mirrored.
My Pull Request has been merged, so the issue is now fixed in the
prawn-templates
gem. The fix was to reset the graphics state before adding any content to the PDF.This was happening because Google Chrome and Google Docs export PDFs with a transformation matrix that vertically flips all of the content. By default, PDFs are rendered from the bottom left corner. Google's custom transformation means that they can calculate coordinates from the top-left corner of the PDF, which does make more sense to me.
P.S. Thanks very much to @KurtPfeifle for the very helpful answer! I wouldn't have got this far without that information.