I have a MySQL with strings that I left dormant for a while. Now that I picked it up again, I noticed that all the special characters are screwed up. My ISP has ported the server to a different machine, I suspect that this might be when it happened.
The database was populated by a PHP script. Everything was supposed to be in UTF-8, that's what the database is set to.
However, this is what a string looks like now:
fête
Those four special characters are supposed to be one character, ê
, the string is meant to be fête
.
Now it looks like this is just re-encoded twice, but that doesn't seem right. Those four characters in hex are:
C3 83 C6 92 C3 82 C2 AA
This looks very much like UTF-8, so if we decode it, we get
C3 3F C2 AA
This isn't quite UTF-8 (because of the 3F
), but let's decode it again:
FF AA
This is not UTF-8.
The ê
character is EA
, in UTF-8, that would be C3 AA
.
Another example: The Spanish upside-down question mark (¿
) is there as C8 83 E2 80 9A C3 82 C2
, which decodes to C3 3F 82 BF
, which isn't proper UTF-8 again (translates to FF 82 BF
). The expected character for ¿
is BF
, i.e. C2 BF
in proper UTF-8.
What happened here? How did the characters get messed up? More importantly, how do I fix it?
(Side note - the new server requires me to write mysql_set_charset("utf8");
or else strings get messed up too, although in the "UTF-8 as latin1" fashion, not in this weird fashion as seen above.)
TL;DR:
- MySQL database was populated in UTF-8 through PHP script
- Lay dormant for years, server got migrated.
- Now characters are messed up, see above.