Search for text in PDF files [closed]

2019-09-04 00:45发布

问题:

I have a list of words about (86 words), and some PDF files. I would like to search for those words into PDF files and return values ​​tell me if exist.

During research for solutions in tutorials I meet two problems:

  1. is that I'm forced to convert pdf file to file ??

  2. what is the simple bibilotheque that allows me to realize my problem, because I'm really stuck it there's a lot of examples (pdfbox, Appach Lucense, iText, pdftron ....)

回答1:

is what I'm forced to convert pdf file to file

PDF file is a file. So, you do not have to convert it. You have to be able to read it. You can use one of available java PDF parsers (e.g. pdfbox as you mentioned).

what is the simple bibilotheque that allows me to realize my problem...

As far as you have only 86 words and one document you probably do not need indexing tool like Lucene. However if you want to build application that supports different targets and different documents (especially if you need a real free text search) you probably need Lucene (or Solr) to perform indexing of your documents first and then performing a search using the index.