How to extract text from the PDF document using PHP?
(I can't use other tools, I don't have root access)
I've found some functions working for plain text, but they don't handle well Unicode characters:
http://www.hashbangcode.com/blog/zend-lucene-and-pdf-documents-part-2-pdf-data-extraction-437.html
Download the class.pdf2text.php @ https://pastebin.com/dvwySU1a (Updated on 5 of April 2014) or http://www.phpclasses.org/browse/file/31030.html (Registration required)
Code:
The class doesn't work with all pdf's I've tested, give it a try and you may get lucky :)
If the above doesn't work, try http://pdfparser.org/
Python version
I know that this topic is quite old, but this need is still alive. I read many documents, forum and script and build a new advanced one which supports compressed and uncompressed pdf :
https://gist.github.com/smalot/6183152
Hope it helps everone