Text is missing when converting pdf file into imag

2019-06-28 03:43发布

I want to convert a PDF page to image file. Text is missing when I convert a PDF page to image using java.

The file which I want to convert 46_2.pdf after converting it shown me like 46_2.png

Code:

import java.awt.image.BufferedImage;
import java.io.File;
import java.util.List;

import javax.imageio.ImageIO;

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;

public class ConvertPDFPageToImageWithoutText {
    public static void main(String[] args) {
        try {
            String oldPath = "C:/PDFCopy/46_2.pdf";
            File oldFile = new File(oldPath);
           if (oldFile.exists()) {

            PDDocument document = PDDocument.load(oldPath);
            List<PDPage> list = document.getDocumentCatalog().getAllPages();

            for (PDPage page : list) {
                BufferedImage image = page.convertToImage();
                File outputfile = new File("C:/PDFCopy/image.png");
                ImageIO.write(image, "png", outputfile);
                document.close();
            }

        }

    } catch (Exception e) {
        e.printStackTrace();
    }
}
}

3条回答
闹够了就滚
2楼-- · 2019-06-28 04:05

Since you're using PDFBox, try using PDFImageWriter.writeToImage instead of PDPage.convertToImage. This post seems relevant to what you are trying to do.

查看更多
淡お忘
3楼-- · 2019-06-28 04:16

Use the latest version of PDFBox(I am using 2.0.9) and add JAI Image I/O dependency from here. This is sample running code on JAVA 7.

    public void pdfToImageConvertorUsingPdfBox(String inputPdfPath) throws Exception {
    File sourceFile = new File(inputPdfPath);
    String formatName = "png";
    if (sourceFile.exists()) {
        PDDocument document = PDDocument.load(sourceFile);
        PDFRenderer pdfRenderer = new PDFRenderer(document);
        int count = document.getNumberOfPages();

        for (int i = 0; i < count; i++) {
            BufferedImage image = pdfRenderer.renderImageWithDPI(i, 200, ImageType.RGB);
            String output = FilenameUtils.removeExtension(inputPdfPath) + "_" + (i + 1) + "." + formatName;
            ImageIO.write(image, formatName, new File(output));
        }
        document.close();
    } else {
        logger.error(sourceFile.getName() + " File not exists");
    }
}
查看更多
看我几分像从前
4楼-- · 2019-06-28 04:24

I had the same problem. I found an article(unfortunally can't remember where because I've read hundred of them). There an author complained that appeared such problems in PDFBox after they updated the Java version to 7.21. So I'm using 7.17 and it works for me:)

查看更多
登录 后发表回答