at the moment I am using itext to read the page count of a pdf. This takes quite long because the lib seems to scan the whole file.
Is the page information somewhere in the header of the pdf, or is a full filescan needed?
at the moment I am using itext to read the page count of a pdf. This takes quite long because the lib seems to scan the whole file.
Is the page information somewhere in the header of the pdf, or is a full filescan needed?
above is the process for counting the pdf pages
Lars Vogel uses the following code:
I'd be surprised if the implementation of
getNumberOfPages
is slower than any other solution.Section F.3.3 says there is a header-field called
N
described as follows:You just need to read the Page tree (Catalogue, Pages, Kids) and count the Page entries.
That's correct. iText parses quite a bit of a PDF when it is opened (it doesn't read the contents of stream objects, but that's about it)...
UNLESS you use the
PdfReader(RandomAccessFileOrArray)
constructor, in which case it will only read the xrefs (mostly required), but not parse anything until you start requesting specific objects (directly or via various calls).So while not perfectly efficient, it'll be vastly more efficient to use a RandomAccessFileOrArray:
Update:
The itext API underwent a little overhaul. Now (in version 5.4.x) the correct way to use it is to pass through java.io.RandomAccessFile: