Why so big difference in sizes of almost identical

2019-08-30 07:02发布

问题:

Have two pdfs, first created with libharu and second created with PDF::API2. If not mention to coordinates then content is the same. But first pdf oversize second by four times. Only one distinction what i found that is type of fonts embedding showed in document properties fonts tab.

In first

Verdana (Embedded Subset) 
  Type: TrueType 
  Encoding: Custom

In second

Verdana 
  Type: TrueType
  Encoding: Custom
  Actual Font: Verdana
  Actual font Type: TrueType

How to deal with that embedded subset?

回答1:

There are many factors that affect the size of the PDF. Your problem may be in the way the PDF creation libraries handle font embedding, specifically:

  • "Embedded subset" means that part of the font's metrics, like glyph widths, are included in the file.
  • If the font is not embedded, presumably it is loaded by the reader from the system, reducing the size of the file.

If the PDF is already small (only has one page, little text and no images), embedding fonts may make a relatively big difference on the size of the document. Still, in absolute terms, an embedded font shouldn't take a lot of space.

Another factor you should check is compression. PDF is mostly a plain-text stream, but it usually comes in compressed form. Try opening both PDFs in a plain text editor and see if it's readable or gibberish. The gibberish (compressed) form will naturally take less space.

Finally, you can inspect the objects the PDF file is composed from using the many PDF inspectors out there, for example this one (I just googled it up, no guarantees it'll work as expected).



回答2:

this is an old question but I had a similar issue.

Did you set libharu to compress your pdf?

in C++, from the documentation

HPDF_SetCompressionMode (pdf, HPDF_COMP_ALL);