I have PDFs that are mostly simply formatted text. I would like to parse the text with PHP. I realize that the PDF is binary so I need a utility or library to convert it to text.
Any recommendations?
I have PDFs that are mostly simply formatted text. I would like to parse the text with PHP. I realize that the PDF is binary so I need a utility or library to convert it to text.
Any recommendations?
Third party software can dump the text contents of a PDF file, for example:
I ended up using XPDF ( which includes pdftotext ). This works great and I use it in production to extract text from millions of PDFs being uploaded to our servers.
Below is the install process for Linux CentOS:
You can't do that with file_get_contents()
because PDF files contain only binary data (no plain text). To read / modify a pdf file you can use some third-party libraries. Take a look at:
And don't forget