Efficient thumbnail generation of huge pdf file?

2020-06-16 03:51发布

问题:

In a system I'm working on we're generating thumbnails as part of the workflow. Sometimes the pdf files are quite large (print size 3m2) and can contain huge bitmap images.

Are there thumbnail generation capable programs that are optimized for memory footprint handling such large pdf files?

The resulting thumbnail can be png or jpg.

回答1:

ImageMagick is what I use for all my CLI graphics, so maybe it can work for you:

convert foo.pdf foo-%png

This produces three separate PNG files:

foo-0.png
foo-1.png
foo-2.png

To create only one thumbnail, treat the PDF as if it were an array ([0] is the first page, [1] is the second, etc.):

convert foo.pdf[0] foo-thumb.png

Since you're worrying about memory, with the -cache option, you can restrict memory usage:

-cache threshold megabytes of memory available to the pixel cache.

Image pixels are stored in memory until threshold megabytes of memory have been consumed. Subsequent pixel operations are cached on disk. Operations to memory are significantly faster but if your computer does not have a sufficient amount of free memory you may want to adjust this threshold value.

So to thumbnail a PDF file and resize it,, you could run this command which should have a max memory usage of around 20mb:

convert -cache 20 foo.pdf[0] -resize 10%x10% foo-thumb.png

Or you could use -density to specify the output density (900 scales it down quite a lot):

convert -cache 20 foo.pdf[0] -density 900 foo-thumb.png


回答2:

Should you care? Current affordable servers have 512 GB ram. That supports storing a full colour uncompressed bitmap of over 9000 inches (250 m) square at 1200 dpi. The performance hit you take from using disk is large.