How to read pdf file in java

2020-08-02 04:34发布

问题:

I am working on a java project that needs to read a pdf file.

I know it is possible using some external libraries like itext.

But is it possible to read a pdf file using java inbuild features without using any external library?

回答1:

Yes it is possible. For reading pdf file from java gone through Apache PDFBOX. This PDFBOX allows creation of new PDF documents, manipulation of existing documents and the ability to extract content from documents. Apache PDFBox also includes several command line utilities.



回答2:

You can to recover the text of a PDF file with Apache PDFBox. In maven project pom.xml, we must add dependence

<dependency>
    <groupId>org.apache.pdfbox</groupId>
    <artifactId>pdfbox</artifactId>
    <version>2.0.8</version>
</dependency>

The code:

try {
    DLFileEntry fileEntry = DLFileEntryLocalServiceUtil.getFileEntry(folder.getGroupId(), folder.getFolderId(), fileName);
    File file = DLFileEntryLocalServiceUtil.getFile(themeDisplay.getUserId(), fileEntry.getFileEntryId(), fileEntry.getVersion(), true);
    PDDocument pddDocument=PDDocument.load(file);
    PDFTextStripper textStripper = new PDFTextStripper();
    String text = textStripper.getText(pddDocument);
} catch (Exception e) {
    // TODO Auto-generated catch block
    e.printStackTrace();
}

To read/create a PDF, see the documentation:

https://pdfbox.apache.org/



标签: java pdf