I need to add metadata to a PDF which I am creating using prawn. That meta-data will be extracted later by, probably, pdf-reader. This metadata will contain internal document numbers and other information needed by downstream tools.
It would be convenient to associate meta-data with each page of the PDF. The PDF specification claims that I can store per-page private data in a "Page-Piece Dictionary". Section 14.5 states:
A page-piece dictionary (PDF 1.3) may be used to hold private
conforming product data. The data may be associated with a page or
form XObject by means of the optional PieceInfo entry in the page
object (see Table 30) or form dictionary (see Table 95). Beginning
with PDF 1.4, private data may also be associated with the PDF
document by means of the PieceInfo entry in the document catalogue
(see Table 28).
How can I set a "page-piece dictionary" with prawn? I'm using prawn 0.12.0.
If that's not possible, how else can I achieve my goal of storing metadata about each page, either at the page level, or at the document level?
you can look at the source of prawn
https://github.com/prawnpdf/prawn/commit/131082af5abb71d83de0e2005ecceaa829224904
info = { :Title => "Sample METADATA",
:Author => "Me",
:Subject => "Not Working",
:CreationDate => Time.now }
@pdf = Prawn::Document.new(:template => filename, :info => info)
One way is to do none of the above; that is, don't attach the metadata as a page-piece dictionary, and don't attach it with prawn. Instead, attach the metadata as a file attachment using the pdftk command-line tool.
To do it this way, create a file with the metadata. For example, the file metadata.yaml might contain:
---
- :document_id: '12345'
:account_id: 10
:page_numbers:
- 1
- 2
- 3
- :document_id: '12346'
:account_id: 24
:page_numbers:
- 4
After you are done creating the pdf file with prawn, then use pdftk to attach the metadata file to the pdf file:
$ pdftk foo.pdf attach_files metadata.yaml output foo-with-attachment.pdf
Since pdftk will not modify a file in place, the output file must be different than the input file.
You may be able to extract the metadata file using pdf-reader, but you can certainly do it with pdftk. This command unpacks metadata.yaml into the unpacked-attachments directory.
$ pdftk foo-with-attachment.pdf unpack_files output unpacked-attachments