I'm writing a php program that pulls from a database source. Some of the varchars have quotes that are displaying as black diamonds with a question mark in them (�, REPLACEMENT CHARACTER, I assume from Microsoft Word text).
How can I use php to strip these characters out?
To make sure your MYSQL connection is set to UTF-8 (or latin1, depending on what you're using), you can do this to:
or use this to check what charset you are using:
More info here: http://php.net/manual/en/function.mysql-set-charset.php
This will help you. Put this inside
<head>
tagI chose to strip these characters out of the string by doing this -
Just add these lines before headers.
Accurate format of
.doc/docx
files will be retrieved:I also faced this � issue. Meanwhile I ran into three cases where it happened:
substr()
I was using
substr()
on a UTF8 string which cut UTF8 characters, thus the cut chars could not be displayed correctly. Usemb_substr($utfstring, 0, 10, 'utf-8');
instead. Creditshtmlspecialchars()
Another problem was using
htmlspecialchars()
on a UTF8 string. The fix is to use:htmlspecialchars($utfstring, ENT_QUOTES, 'UTF-8');
preg_replace()
Lastly I found out that
preg_replace()
can lead to problems with UTF. The code$string = preg_replace('/[^A-Za-z0-9ÄäÜüÖöß]/', ' ', $string);
for example transformed the UTF string "F(×)=2×-3" into "F � 2� ". The fix is to usemb_ereg_replace()
instead.I hope this additional information will help to get rid of such problems.
As mentioned in earlier answers, it is happening because your text has been written to the database in
iso-8859-1
encoding, or any other format.So you just need to convert the data to
utf8
before outputting it.