Reading PDF Document using iText in Android

2019-08-14 21:00发布

问题:

I am currently testing samples of reading PDF using itext in android but i have a problem. The code below does not display anything in android emulator:

public void onCreate(Bundle savedInstanceState) {
    super.onCreate(savedInstanceState);
    setContentView(R.layout.activity_main);        
    AssetManager assetManager = getAssets();
    InputStream istr = null;
    PdfReader reader=null;
    String str= null;
    try {
         istr =(InputStream) assetManager.open("resume.pdf");
         reader=new PdfReader(istr);
         str = PdfTextExtractor.getTextFromPage(reader, 1).toString();
         //str=reader.getPageContent(1).toString();
    }
    catch (Exception e)
    {
        e.printStackTrace();
    }
    TextView tv = (TextView) findViewById(R.id.txtview);
    tv.setText(str);
}

The code is working but it does not display the contents of the PDF.

I think the problem here its not opening the PDF Document properly?

My Goal here is to EXTRACT text from a PDF Document and Transfer it to a Variable in the Code then display it.

I am using iText Version 5.3.3.

回答1:

If your PDF is made with a PDF maker, so it is text and NOT a scanned document or other picture, this should do it:

                String content;
                PdfReader reader = null;
                try {
                    //String fileName is the string with the path to your .pdf file, for example resources/pdfs/preface.pdf
                    reader = new PdfReader(fileName);
                } catch (IOException e) {
                    // TODO Auto-generated catch block
                    e.printStackTrace();
                }
                int numberOfPages = readerTest.getNumberOfPages();
                numberOfPages = numberOfPages + 1;
                for (int page = 1; page < numberOfPages; page++){
                    try {
                        String content1Page = PdfTextExtractor.getTextFromPage(reader, page);
                        content = content + content1Page;
                    } catch (IOException e) {
                        // TODO Auto-generated catch block
                        e.printStackTrace();
                    }
                }

Now String content contains the text of the PDF.

EDIT: You could also first try to leave out the toString() method in this line: str = PdfTextExtractor.getTextFromPage(reader, 1).toString();