PDF: How to Optimize Filesize & Convert to PNG (em

2019-07-22 05:03发布

问题:

I have a PDF with embedded fonts that I can't seem to work with. Right now, I'm using GhostScript and trying to do 2 things:

  • Minimize filesize of PDF:

    gswin32c -dSAFER -dBATCH -dNOPAUSE -dQUIET -sDEVICE=pdfwrite -sOutputFile=output.pdf input.pdf

  • Convert PDF to PNG (super sample, to be used for creating other thumbnails):

    gswin32c -dSAFER -dBATCH -dNOPAUSE -dQUIET -dFirstPage=1 -dLastPage=1 -r288 -sDEVICE=png16m -sOutputFile=output.pdf input.pdf

The above works well when working on scanned documents. But when I run them against PDFs with embedded fonts (the PDF is generated on the fly by an application), it fails. Here's the error I get:

GPL Ghostscript 8.71: Warning: 'loca' length 274 is greater than numGlyphs 136 i
n the font UUGHDE+ArialMT.
GPL Ghostscript 8.71: Warning: 'loca' length 274 is greater than numGlyphs 136 i
n the font UUGHDE+ArialMT.
GPL Ghostscript 8.71: Warning: 'loca' length 188 is greater than numGlyphs 93 in
 the font UUGHDE+Arial-BoldMT.
GPL Ghostscript 8.71: Warning: 'loca' length 188 is greater than numGlyphs 93 in
 the font UUGHDE+Arial-BoldMT.

Aside from GhostScript, I also have access to PDFTK and ImageMagick (which might be replaced with GraphicsMagick). I'm also open to other solutions.

Development is on WAMP. Deployment is to LAMP.

Suggestions?

回答1:

The fonts used inside your PDFs seem to be OpenType fonts. The software that created these PDFs seems to have subsetted the fonts. During font embedding and subsetting by this software (which "generates the PDFs on the fly" -- was it also Ghostscript?!?), there seems to have occurred a problem that made it to not comply 100% with the specification.

'loca' tables are part of OpenType Font descriptions. They represent an index to all glyph locations.

Now you process these not completely 'kosher' PDFs with Ghostscript. Ghostscript gives out warnings, but no errors.

GS errors usually mean: "I'll abort further processing. I can't work around a problem or repair this corrupt file. Should I have written output files already, they will be useless."

GS warnings usually mean: "I've encountered a problem. But I'll continue to process the input and work around it. I've written a valid output file. But you better check it, especially its fidelity!"

The warnings (not errors!) you see mean this:

  1. One of the subsetted fonts in question claims the number of glyphs to be 188 according to the table.
  2. But in reality the actual font description contains only definitions for 93 glyph shapes.