How to create image from PDF using PDFBox in JAVA

2019-07-21 05:55发布

问题:

I want to create an image from first page of PDF . I am using PDFBox . After researching in web , I have found the following snippet of code :

public class ExtractImages
 {
    public static void main(String[] args)
    {
        ExtractImages obj = new ExtractImages();
            try 
            {
                obj.read_pdf();
            }

            catch (IOException ex)
            {
                System.out.println("" + ex);
            }

    }

    void read_pdf() throws IOException 
    {
            PDDocument document = null; 
            try 
            {
                document = PDDocument.load("H:\\ct1_answer.pdf");
            }
            catch (IOException ex)
            {
                System.out.println("" + ex);
            }

            List<PDPage>pages =  document.getDocumentCatalog().getAllPages();
            Iterator iter =  pages.iterator(); 

            int i =1;
            String name = null;

            while (iter.hasNext()) 
            {
                PDPage page = (PDPage) iter.next();
                PDResources resources = page.getResources();
                Map pageImages = resources.getImages();
                if (pageImages != null) 
                { 
                    Iterator imageIter = pageImages.keySet().iterator();
                    while (imageIter.hasNext()) {
                        String key = (String) imageIter.next();
                        PDXObjectImage image = (PDXObjectImage) pageImages.get(key);
                        image.write2file("H:\\image" + i);
                        i ++;
                    }
                }
            }

        }

 } 

In the above code there is no error . But the output of this code is nothing . I have expected that the above code will produce a series of image which will be saved in H drive . But there is no image in that code produced from this code . Why ?

回答1:

Without trying to be rude, here is what the code you posted does inside its main working loop:

PDPage page = (PDPage) iter.next();
PDResources resources = page.getResources();
Map pageImages = resources.getImages();

It's getting each page from the PDF file, getting the resources from the page, and extracting the embedded images. It then writes those to disk.

If you are to be a competent software developer you need to be able to research and read documentation. With Java, that means Javadocs. Googling PDPage (or explicitly going to the apache site) turns up the Javadoc for PDPage.

On that page you find two versions of the method convertToImage() for converting the PDPage to an image. Problem solved.

Except ...

Unfortunately, they return a java.awt.image.BufferedImage which based on other questions you have asked is a problem because it is not supported on the Android platform which is what you're working on.

In short, you can't use Apache's PDFBox on Android to do what you're trying to do.

Searching on StackOverflow you find this same question posed several times in different forms, which will lead you to this: https://stackoverflow.com/questions/4665957/pdf-parsing-library-for-android/4766335#4766335 with the following answer that would be of interest to you: https://stackoverflow.com/a/4779852/302916

Unfortunately even the one that the aforementioned answer says will work ... is not very user friendly; there's no "How to" or docs that I can find. It's also labeled as "alpha". This is probably not something for the feint hearted as it's going to require reading and understanding their code to even start using it.



回答2:

I copied your above code and added following libs to my buildpath in eclipse. It is working.

Apache PDFBox 1.7.1 libs

Commons Logging 1.1.1 libs



标签: java pdf pdfbox