I have a bunch of PDF files- I read these as requested into a byte array and then also pass it to a iTextSharp PdfReader instance. I want to then grab the dimensions of each page- in pixels. From what I've read so far it seems by PDF files work in points- a point being a configurable unit stored in some kind of dictionary in an element called UserUnit.
Loading my PDF File into a PdfReader, what do I need to do to get the UserUnit for each page (apparently it can vary from page to page) so I can then get the page dimensions in pixels.
At present I have this code, which grabs the dimensions for each page in "points" - guess I just need the UerUnit, and can then multiply these dimensions by that to get pixels or something similar.
//Create an object to read the PDF
PdfReader reader = new iTextSharp.text.pdf.PdfReader(file_content);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
Rectangle dim = reader.GetPageSize(i);
int[] xy = new int[] { (int)dim.Width, (int)dim.Height }; // returns page size in "points"
page_data[objectid + '-' + i] = xy;
}
Cheers!
Allow me to quote from my book:
iText in Action - Second Edition, page 9:
FAQ What is the measurement unit in PDF documents? Most of the measurements
in PDFs are expressed in user space units. ISO-32000-1 (section 8.3.2.3) tells us
“the default for the size of the unit in default user space (1/72 inch) is
approximately the same as a point (pt), a unit widely used in the printing
industry. It is not exactly the same; there is no universal definition of a point.”
In short, 1 in. = 25.4 mm = 72 user units (which roughly corresponds to 72 pt).
On the next page, I explain that it’s possible to change the default value of the user unit, and I add an example on how to create a document with pages that have a different user unit.
Now for your question: suppose you have an existing PDF, how do you find which user unit was used? Before we answer this, we need to take a look at ISO-32000-1.
In section 7.7.3.3Page Objects, you'll find the description of UserUnit in Table 30, "Entries in a page object":
(Optional; PDF 1.6) A positive number that shall give the size of
default user space units, in multiples of 1⁄72 inch. The range of
supported values shall be implementation-dependent. Default value: 1.0
(user space unit is 1⁄72 inch).
This key was introduced in PDF 1.6; you won't find it in older files. It's optional, so you won't always find it in every page dictionary. In my book, I also explain that the maximum value of the UserUnit key is 75,000.
Now how to retrieve this value with iTextSharp?
You already have Rectangle dim = reader.GetPageSize(i);
which returns the MediaBox. This may not be the size of the visual part of the page. If there's a CropBox defined for the page, viewers will show a much smaller size than what you have in xy
(but you probably knew that already).
What you need now is the page dictionary, so that you can retrieve the value of the UserUnit key:
PdfDictionary pageDict = reader.GetPageN(i);
PdfNumber userUnit = pageDict.GetAsNumber(PdfName.USERUNIT);
Most of the times userUnit will be null
, but if it isn't you can use userUnit.FloatValue
.