iText as text Extracting/Reading from PDF on andro

2019-07-15 19:57发布

问题:

I'm having a problem with iText. Other people say that iText is for PDF Creation only? and it can not read or extract text from a PDF. is that true?

If it is true then what are other options i can choose to EXTRACT text from PDF File and Save it on a Variable or Display it in Android device?

If iText is capable of Extracting text from PDF, then HOW?

回答1:

iText can extract text from PDFs. While it is true that it originated as a tool to create new and manipulate existing PDFs, it in the recent years also has become better and better at extracting text. This obviously implies that you should use a current iText version (5.3.x) for text extraction.

The book "iText in Action, second edition" by the main iText developer, Bruno Lowagie, explains basic iText text extraction in chapter 15, and the samples from that chapter are available in the iText Sourceforge SVN repository, cf. Samples for chapter 15. A good starting point is ExtractPageContentSorted2 which extracts the text of a whole page.

If you have special requirements, you may use ExtractPageContentSorted1 as a starting point which explicitly defines a text extraction strategy; depending on your requirements you will need your own startegy. If you want the text from a specific region only, look at ExtractPageContentArea.

To really fine tune the text extraction capabilities of iText, you should have a look at the itext-question mailing list archive (e.g. at nabble.com) as recently the iText text extraction API was extended to serve additional use cases.



回答2:

Use below code to extract text from pdf :


String pat = data.getData().getPath();
File f = new File(pat);
//f is file path of pdf file
read = new PdfReader(new FileInputStream(f));

parser = new PdfReaderContentParser(read);

strw = new StringWriter();

stretegy = parser.processContent(j, new SimpleTextExtractionStrategy());

strw.write(stretegy.getResultantText());

String da = strw.toString();

//set extracted text from pdf file 
//to Edit-text    
edt1.setText(da);