I built a webpage where users can submit a PDF which is then inserted into a MySQL database in a mediumblob for retrieval later.
This all works fine, except when the PDF contains images or embedded fonts, in which case the images are corrupted and any text using the font disappears (Acrobat display a message about the missing font).
I've determined the problem occurs from my passing the pdf data through the mysql_real_escape_string_function. I have switched to base64_encode/base64_decode on submission/retrieval which fixed the problem for all new files, but I have about 25 already submitted PDFs I need to be able to read.
Is it possible to reversed the effects of mysql_real_escape_string? Or are these files damaged beyond repair?
Sure, should be fixable. You just need to figure out exactly what mysql_real_escape_string does. I believe you just need to remove any slashes that immediately precede a CR, LF, TAB, single-quote, double-quote, NUL, or another slash. Should be a one-line regexp fix.
mysql_real_escape_string()
puts backslashes to these characters.The thing is, that if your binary output has backslashes it it's binary data, it can be very hard to fix. That being said, there is no magical function to undo this function.
Ólafur,
I gathered that from the php manual, and even tried the following:
This seems to works fine when dealing with text, but applying it to the binary data only further degrades PDF (e.g. paragraphs go missing).
I honestly don't know what else it could be. When I changed that bit of code it cleared up the problem, and I've found other instances online where people had the same problem (but no solutions).
Here is the insertion code:
And here is the extraction code: