This question already has an answer here:
- text-mine PDF files with Python? 2 answers
I have like 400 or more PDF files that together form a single text. Its like a book separated page by page. I need to programatically be able to search some keywords over the whole text.
So my first question is: is it better to search page by page or join all the PDFs in one big file first and then perform the search?
The second one is: what is the best way to make it? Is there already any good program or library out there?
By the way, I'm using PHP and Python, only.
Use PyPdf, as described here.
It is faster and much simpler to search them one by one, because you can then simply loop over all the files and use the code on every file.