Using GhostScript to get page size

2019-01-09 04:24发布

问题:

Is it possible to get the page size (from e.g. a PDF document page) using GhostScript? I have seen the "bbox" device, but it returns the bounding box (it differs per page), not the TrimBox (or CropBox) of the PDF pages. (See http://www.prepressure.com/pdf/basics/page_boxes for info about page boxes.) Any other possibility?

回答1:

Meanwhile I found a different method. This one uses Ghostscript only (just as you required). No need for additional third party utilities.

This method uses a little helper program, written in PostScript, shipping with the source code of Ghostscript. Look in the toolbin subdir for the pdf_info.ps file.

The included comments say you should run it like this in order to list fonts used, media sizes used

gswin32c -dNODISPLAY ^
   -q ^
   -sFile=____.pdf ^
   [-dDumpMediaSizes] ^
   [-dDumpFontsUsed [-dShowEmbeddedFonts]] ^
   toolbin/pdf_info.ps

I did run it on a local example file, with commandline parameters that ask for the media sizes only (not the fonts used). Here is the result:

C:\> gswin32c ^
      -dNODISPLAY ^
      -q ^
      -sFile=c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf ^
      -dDumpMediaSizes ^
      C:/gs8.71/lib/pdf_info.ps


  c:\downloads\_IXUS_850IS_ADVCUG_EN.pdf has 146 pages.
  Creator: FrameMaker 6.0
  Producer: Acrobat Distiller 5.0.5 (Windows)
  CreationDate: D:20060817164306Z
  ModDate: D:20060822122024+02'00'

  Page 1 MediaBox: [ 595 842 ] CropBox: [ 419.535 297.644 ]
  Page 2 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 3 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  Page 4 MediaBox: [ 595 842 ] CropBox: [ 297.646 419.524 ]
  [....]


回答2:

Unfortunately it doesn't seem quite easy to get the (possibly different) page sizes (or *Boxes for that matter) inside a PDF with the help of Ghostscript.

But since you asked for other possibilities as well: a rather reliable way to determine the media sizes for each page (and even each one of the embedded {Trim,Media,Crop,Bleed}Boxes) is the commandline tool pdfinfo.exe. This utility is part of the XPDF tools from http://www.foolabs.com/xpdf/download.html . You can run the tool with the "-box" parameter and tell it with "-f 3" to start at page 3 and with "-l 8" to stop processing at page 8.

Example output:

C:\downloads>pdfinfo -box -f 1 -l 3 _IXUS_850IS_ADVCUG_EN.pdf
Creator:        FrameMaker 6.0
Producer:       Acrobat Distiller 5.0.5 (Windows)
CreationDate:   08/17/06 16:43:06
ModDate:        08/22/06 12:20:24
Tagged:         no
Pages:          146
Encrypted:      no
Page    1 size: 419.535 x 297.644 pts
Page    2 size: 297.646 x 419.524 pts
Page    3 size: 297.646 x 419.524 pts
Page    1 MediaBox:     0.00     0.00   595.00   842.00
Page    1 CropBox:     87.25   430.36   506.79   728.00
Page    1 BleedBox:    87.25   430.36   506.79   728.00
Page    1 TrimBox:     87.25   430.36   506.79   728.00
Page    1 ArtBox:      87.25   430.36   506.79   728.00
Page    2 MediaBox:     0.00     0.00   595.00   842.00
Page    2 CropBox:    148.17   210.76   445.81   630.28
Page    2 BleedBox:   148.17   210.76   445.81   630.28
Page    2 TrimBox:    148.17   210.76   445.81   630.28
Page    2 ArtBox:     148.17   210.76   445.81   630.28
Page    3 MediaBox:     0.00     0.00   595.00   842.00
Page    3 CropBox:    148.17   210.76   445.81   630.28
Page    3 BleedBox:   148.17   210.76   445.81   630.28
Page    3 TrimBox:    148.17   210.76   445.81   630.28
Page    3 ArtBox:     148.17   210.76   445.81   630.28
File size:      6888764 bytes
Optimized:      yes
PDF version:    1.4


回答3:

A solution in pure GhostScript PostScript, no additional scripts necessary:

gs -dQUIET -sFileName=path/to/file.pdf -c "FileName (r) file runpdfbegin 1 1 pdfpagecount {pdfgetpage /MediaBox get {=print ( ) print} forall (\n) print} for quit"

The command prints the MediaBox of each page in the PDF as four numbers per line. An example from a 3-page PDF:

0 0 595 841
0 0 595 841
0 0 595 841

Here's a breakdown of the command:

FileName (r) file  % open file given by -sFileName
runpdfbegin        % open file as pdf
1 1 pdfpagecount { % for each page index
  pdfgetpage       % get pdf page properties (pushes a dict)
  /MediaBox get    % get MediaBox value from dict (pushes an array of numbers)
  {                % for every array element
    =print         % print element value
    ( ) print      % print single space
  } forall
  (\n) print       % print new line
} for
quit               % quit interpreter. Not necessary if you pass -dBATCH to gs

Replace /MediaBox with /CropBox to get the crop box.