I am trying to extract images from a PDF file. I found an example on the web, that worked fine:
PdfReader reader;
File file = new File("example.pdf");
reader = new PdfReader(file.getAbsolutePath());
for (int i = 0; i < reader.getXrefSize(); i++) {
PdfObject pdfobj = reader.getPdfObject(i);
if (pdfobj == null || !pdfobj.isStream()) {
continue;
}
PdfStream stream = (PdfStream) pdfobj;
PdfObject pdfsubtype = stream.get(PdfName.SUBTYPE);
if (pdfsubtype != null && pdfsubtype.toString().equals(PdfName.IMAGE.toString())) {
byte[] img = PdfReader.getStreamBytesRaw((PRStream) stream);
FileOutputStream out = new FileOutputStream(new File(file.getParentFile(), String.format("%1$05d", i) + ".jpg"));
out.write(img);
out.flush();
out.close();
}
}
That gave me all the images, but the images were in the wrong order. My next attempt looked like this:
for (int i = 0; i <= reader.getNumberOfPages(); i++) {
PdfDictionary d = reader.getPageN(i);
PdfIndirectReference ir = d.getAsIndirectObject(PdfName.CONTENTS);
PdfObject o = reader.getPdfObject(ir.getNumber());
PdfStream stream = (PdfStream) o;
// rest from example above
}
Although o.isStream() == true, I only get /Length and /Filter and the stream is only about 100 bytes long. No image to be found at all.
My question would be what the correct way would be to get all the images from a PDF file in the correct order.