Are there tools to determine whether a PDF has landscape orientation or portrait?
I have currently looked upon pdfbox and Itext for this but seem that I could not find it. Please tell if they support this.
Extracting the PDF pages information using Origami is providing a information the pdf pages have rotation of some degree. Here is what Origami reports:
{:Parent=>#<PDF::Reader::Reference:0x872349c @id=8, @gen=0>, :Type=>:Page,
:Contents=>#<PDF::Reader::Reference:0x8722f24 @id=4, @gen=0>, :Resources=># <PDF::Reader::Reference:0x870dbd8 @id=2, @gen=0>,
:MediaBox=>[0, 0, 612, 792], :Rotate=>270}
Rotate : 270
What does the 'rotation' actually mean?
The pdfinfo
commandline utility has a way to let you see the page size info and MediumBox, CropBox, BleedBox, ArtBox and TrimBox values for each and every page. Here I ask about the values for pages 2 to 4 of a specific document:
pdfinfo -box -f 2 -l 4 sample.pdf
Creator: FrameMaker 6.0
Producer: Acrobat Distiller 5.0.5 (Windows)
CreationDate: Thu Aug 17 16:43:06 2006
ModDate: Tue Aug 22 12:20:24 2006
Tagged: no
Form: AcroForm
Pages: 146
Encrypted: no
Page 2 size: 419.535 x 297.644 pts
Page 2 rot: 90
Page 3 size: 297.646 x 419.524 pts
Page 3 rot: 0
Page 4 size: 297.646 x 419.524 pts
Page 4 rot: 0
Page 2 MediaBox: 0.00 0.00 595.00 842.00
Page 2 CropBox: 87.25 430.36 506.79 728.00
Page 2 BleedBox: 87.25 430.36 506.79 728.00
Page 2 TrimBox: 87.25 430.36 506.79 728.00
Page 2 ArtBox: 87.25 430.36 506.79 728.00
Page 3 MediaBox: 0.00 0.00 595.00 842.00
Page 3 CropBox: 148.17 210.76 445.81 630.28
Page 3 BleedBox: 148.17 210.76 445.81 630.28
Page 3 TrimBox: 148.17 210.76 445.81 630.28
Page 3 ArtBox: 148.17 210.76 445.81 630.28
Page 4 MediaBox: 0.00 0.00 595.00 842.00
Page 4 CropBox: 148.17 210.76 445.81 630.28
Page 4 BleedBox: 148.17 210.76 445.81 630.28
Page 4 TrimBox: 148.17 210.76 445.81 630.28
Page 4 ArtBox: 148.17 210.76 445.81 630.28
Page 4 MediaBox: 0.00 0.00 595.00 842.00
File size: 6888764 bytes
Optimized: yes
PDF version: 1.4
Note the following:
*Box
values: these are 4 numbers whose units are PostScript points: the first pair represents the coordinates of the lower left corner, the second pair represents coordinates of the upper right corner.
MediaBox
: Is a required setting for each page inside the PDF.
TrimBox
: Is an optional setting and defaults to the same as MediaBox if it is not explicitly defined. If it deviates from the MediaBox, then it tells PDF viewers (and printer drivers) to only render and display that particular part of the full page.
Page size
: This info is derived + computed from the distances that are set up by the TrimBox value.
rot
: This gives the value of the page rotation. May be 0
, 90
, 180
or 270
degrees.
Now, the page's landscape and portrait definitions are this:
- It is regarded as 'landscape' if the width is greater than the height.
- It is regarded as 'portrait' if the height is greater than the width.
- It is undetermined if width and height have the same value.
But!,....
...you can put a non-zero /Rotation
value into your PDF source code (which pdfinfo
will show as rot:
info) and achieve this way that a 'portrait' PDF page will display as 'landscape' and vice-versa;
...you could define a 'landscape' shaped '/TrimBoxinside a 'portrait' shaped
/MediaBox` or vice versa, as well as mix it with a non-zero rotation, and achieve this way that the 'landscape' shaped content will appear in 'portrait' (or upside-down) look...
Confused about this? Don't worry, many are. Fact is, 'landscape' and 'portrait' aren't clearly and un-ambiguously defined technical terms. They are just conventions to describe what we see...