I'm trying to write a routine that will take a PDF submitted by a user, and extract each page as an image and then populate an array with those images. I've found several examples that append all pages to one image, but none that do what I need.
This is what I have, but it returns an empty array:
function PdfToImg($pdf_in) {
$img_array = array();
$im = new imagick();
$im->readimageblob($pdf_in); // reading image from binary string
$num_pages = $im->getnumberimages();
$im->setimageformat("png");
for ($x =1;$x <= $num_pages; $x++) {
$img = $im->previousimage();
$img_array .= $img;
}
return $img_array;
}
One of the caveats here is that I can't write these files to disk, must use strings/arrays. I looked through the ImageMagick manual, and didn't find anything about outputting multiple images to an array, only to a series of files saved to disk.
UPDATE: (06/13/2012) I found a way to achieve what I need, but it's ugly, inefficient and I'm sure slow but there didn't seem to be any other way.
function PdfToImg3($pdf_in) {
$img_array = array();
$im = new imagick();
$im->readimageblob($pdf_in);
$num_pages = $im->getnumberimages();
$i = 0;
for($x = 1;$x <= $num_pages; $x++) {
$im = new imagick();
$im->readimageblob($pdf_in);
$im->setiteratorindex($i);
$im->setimageformat('png');
$img_array[$x] = $im->getimageblob();
$im->destroy();
$i++;
}
$im->destroy();
return $img_array;
}
Produces an array named $img_array, with the pages of the incoming PDF residing within keys of $img_array as strings of PNG image data.
There MUST be a better way, why won't nextImage() work? Why can't I use setIteratorIndex without reinitializing/(creating new?) imagick objects each time? I must be missing something, but there are gaping holes in the documentation and Google, the ImageMagick forums, nor StackOverflow know anything about this being done successfully.
TESTED: Extremely slow, a 17 page simple PDF takes almost a minute.
UPDATE 2: (07/11/2012) After finishing the larger project that this code bit went into, I decided to return to a few points and improve upon the performance. This is what I came up with:
$img_array = array();
$im = new imagick();
$im->readimageblob($pdf_in);
$num_pages = $im->getnumberimages();
$im->destroy();
$i = 0;
for($x = 1;$x <= $num_pages; $x++) {
$im = new imagick();
$im->readimageblob($pdf_in);
$im->setResolution(300,300);
$im->setiteratorindex($i);
$im->setimageformat('png');
$img_array[$x] = $im->getimageblob();
$im->destroy();
$i++;
}
return $img_array;
This change resulted in a 4 page complex PDF conversion going from 21-25 seconds down to about 2-3 seconds. I understand why some of the changes helped, not so clear on the others. Hopefully someone will find this useful.
UPDATE3: Figured out why performance went up so much, moving 'setResolution to below 'readImageBlob' causes the DPI setting to be ignored, which defaults to 72. In note of this, I moved the declaration back, and reduced it to 150 and achieved similar results but still much better performance. See notes on php.net here.
This reading and destroying blobs all the time is probably slowing us down a lot, in fact we do not need them at all, peeled code looks like this: