Any suggestions for shrinking a PDF file?

2020-05-21 04:17发布

问题:

We've got a .net 2.0 web system that dynamically builds pdf files. Some of these files can get pretty large - 12MB+. While processing time isn't a factor, really, the size of the files to be downloaded is in some cases.

For the moment, let's assume that our B-grade pdf library is already making the smallest files that it knows how. (Although, if anyone has any suggestions on that front, do see this related question.)

However, taking the 12MB file in question and sending it though the Acrobat distiller results in a roughly 700K file, with no appreciable loss in print quality.

I'd love to have some kind of post-processor that does even a third of that. Does anyone have any controls they know about that'll do something like this?

The cheaper the better, for this project, but we're not adverse to throwing a few bucks down.

(Some preemptive comments: naturally, rewriting the existing PDF generation code with a new tool is off the table at the moment. Also, while Distiller seems to have an API, calling that on a webserver doesn't seem like the most efficient course - and Distiller is a little pricey. Finally, we'd just as well not wrap the pdfs in a zip file or some such, since that may baffle the clients somewhat. No, really.)

Thanks!

回答1:

Use Ghostscript, which is also available for the 32bit and 64bit Windows platforms. It recognises all Adobe Distiller parameters[1] and honors most of them. On top of that, you can inject PostScript programs into the conversion process. I use it for a year now in a pre-print production environment on image-heavy PDFs. If the parameters are set correct, the file-size can go from 40MB down to 800kB with no visible loss of quality. I found it to be quite fast, in fact the documentation states that it may be faster than Adobe Distiller.

And it is free (as in beer as well as in speech).

[1] See distparm.pdf in the help folder of Distiller or look here.

How you use it

You call it from the command line with all your wanted parameters, input and output-files and you're done.

Quick example:

gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite\
   -dCompatibilityLevel=1.3 -dEncodeColorImages=true\
   -sOutputFile=output.pdf input.pdf

Some valuable resources:

  • Ghostscript PDF writer tips
  • Postscript-to-PDF converter documentation


回答2:

PDF's usually use JBIG/JBIG2/JPEG2000 compression. Cvision's PDFCompressor is the best for compressing PDF's.



回答3:

There are multiple flavors of PDF with different size functionality trade-offs. If you are converting text-based documents (word/excel/etc) versus image documents (TIFF/JPG/BMP/etc) then it would probably explain the smaller file sizes that distiller gives you. You need to make sure your utility is not just creating Image-only PDF files (which a typically much bigger) out of everything. Also the compression format is very important ESPECIALLY for color documents. Look for configuration options that allow you to tweak those settings. If you mention the specific PDF builder tool we might be able to give you more specific help on that.

Here is a decent reference on the "flavors" of PDF files:



回答4:

Apago have lots of tools for 'tidying up' PDFs

http://www.apagoinc.com/



回答5:

File a bug with the maker of your pdf library? If it's open source, fix a couple of the low hanging fruit (there are probably many) and submit a patch?



回答6:

I don't have a specific answer to your question, so I hope that my response is not poor form.

I've used pdftk for a variety of PDF-related tasks. It's easy to use from the shell and I see that it does have a compression feature. You could try it out quickly to see if it's something that would work for post processing for your application.



回答7:

Aside from using another library, your best bet is to get your library working right. Some suggestions on your other post - I'm not sure of any 'post process' that you would want to run to compress down the file.

As an aside, does your webserver allow HTTP gzipped content? Transparent to the end user!

(That being said, short PDF files should be pretty impervious to most compression methods - images should be compressed during rendering (and JPEG >> ZIP in this case) - but if you have a lot of text, gzip can help)



回答8:

Don't include entire fonts in the PDF. Taking care of that one can save a few megabytes.



回答9:

If your pdf library is making sub-optimal PDFs then loading and saving the PDF in any other library ought to give you smaller files. PDFNet SDK Type 3 should be up to this task and at 360USD is cheaper than Adobe PDF library.



回答10:

If you're interested in lossless compression, try my tool Precomp and a file compressor of your choice. Depending on what contents are in your PDF file, Precomp usually enlarges your PDF file so it can be compressed much better afterwards.