Read pdf using iText

2019-01-24 14:30发布

问题:

I am getting problem to read pdf files using iText in java. I can read only one page but when I go to second page it gives exception.I want to read all the pages of any pdf file.

PdfTextExtractor parser =new PdfTextExtractor(new PdfReader("C:/Text.pdf"));
parser.getTextFromPage(3);

I am using these lines and at second line gives exception.

回答1:

  1. Try changing the file location. Sometimes OS does not allow file to be read from some system drives by other applications. Put somewhere in D: etc. I face this problem in Vista when reading files from desktop.

  2. I in fact ran the same two lines of code on one of my PDF and it did print the text. Also make sure you have sufficient pages in the PDF. (3 pages or more) or try with parser.getTextFromPage(1) etc. to get content from other pages.



回答2:

when you say one page, do you mean the first page? you might be indexing the pages incorrectly? Without any more info it could be anything.



回答3:

Are you re-constructing the parser and reader for each operation? You can do that, but it's not very efficient (there is a lot of overhead with creating a new PdfReader).



回答4:

import com.itextpdf.text.pdf.PdfReader;
import com.itextpdf.text.pdf.parser.PdfTextExtractor;

/**
 * This class is used to read an existing
 *  pdf file using iText jar.
 * @author javawithease
 */
public class PDFReadExample {
  public static void main(String args[]){
    try {
    //Create PdfReader instance.
    PdfReader pdfReader = new PdfReader("D:\\testFile.pdf");    

    //Get the number of pages in pdf.
    int pages = pdfReader.getNumberOfPages(); 

    //Iterate the pdf through pages.
    for(int i=1; i<=pages; i++) { 
      //Extract the page content using PdfTextExtractor.
      String pageContent = 
        PdfTextExtractor.getTextFromPage(pdfReader, i);

      //Print the page content on console.
      System.out.println("Content on Page "
                          + i + ": " + pageContent);
      }

      //Close the PdfReader.
      pdfReader.close();
    } catch (Exception e) {
    e.printStackTrace();
    }
  }
}


标签: java itext