I'm setting up a new server and want to support UTF-8 fully in my web application. I have tried this in the past on existing servers and always seem to end up having to fall back to ISO-8859-1.
Where exactly do I need to set the encoding/charsets? I'm aware that I need to configure Apache, MySQL, and PHP to do this — is there some standard checklist I can follow, or perhaps troubleshoot where the mismatches occur?
This is for a new Linux server, running MySQL 5, PHP, 5 and Apache 2.
Just a note:
You are facing the problem of your non-latin characters is showing as
?????????
, you asked a question, and it got closed with a reference to this canonical question, you tried everything and no matter what you do you still get??????????
fromMySQL
.That is mostly because you are testing on your old data which has been inserted to the database using the wrong charset and got converted and stored to actually the question mark characters
?
. Which means you lost your original text forever and no matter what you try you will get???????
.re applying what you have learned from the answers of this question on a fresh data could solve your problem.
In my case, I was using
mb_split
, which uses regex. Therefore I also had to manually make sure the regex encoding was utf-8 by doingmb_regex_encoding('UTF-8');
As a side note, I also discovered by running
mb_internal_encoding()
that the internal encoding wasn't utf-8, and I changed that by runningmb_internal_encoding("UTF-8");
.The only thing I would add to these amazing answers is to emphasize on saving your files in utf8 encoding, i have noticed that browsers accept this property over setting utf8 as your code encoding. Any decent text editor will show you this, for example Notepad++ has a menu option for file enconding, it shows you the current encoding and enables you to change it. For all my php files I use utf8 without BOM.
Sometime ago i had someone ask me to add utf8 support for a php/mysql application designed by someone else, i noticed that all files were encoded in ANSI, so I had to use ICONV to convert all files, change the database tables to use the utf8 charset and utf8_general_ci collate, add 'SET NAMES utf8' to the database abstraction layer after the connection (if using 5.3.6 or earlier otherwise you have to use charset=utf8 in the connection string) and change string functions to use the php multibyte string functions equivalent.