I have a MySQL table column that I'm trying to convert from latin1 to UTF8. Some of the values in the column are UTF8 already, but they are being stored as latin1, which results in some strange looking text. Switching the column over is pretty straight forward, all I have to do is:
ALTER TABLE `user_profiles` MODIFY `last_name` varchar(20) CHARACTER SET utf8;
The next step would be converting any now doubly encoded columns back to UTF8. I can get a list of all those columns by running the following command:
SELECT `last_name`, CONVERT(CAST(CONVERT(`last_name` USING latin1) AS BINARY) USING utf8) AS `converted_last_name`
FROM `user_profiles`
WHERE (CONVERT(CAST(CONVERT(`last_name` USING latin1) AS BINARY) USING utf8)) IS NOT NULL
AND CONVERT(CAST(CONVERT(`last_name` USING latin1) AS BINARY) USING utf8) != `last_name`;
Which gives me something like:
| last_name | converted_last_name |
| ----------------|----------------------|
| 王维雄 | 王维雄 |
| niño de rivera | niño de rivera |
| Thölix | Thölix |
Looks good. Now's when things get weird though. If I run the update command:
UPDATE `user_profiles`
SET `last_name` = CONVERT(CAST(CONVERT(`last_name` USING latin1) AS BINARY) USING utf8)
WHERE (CONVERT(CAST(CONVERT(`last_name` USING latin1) AS BINARY) USING utf8)) IS NOT NULL
AND CONVERT(CAST(CONVERT(`last_name` USING latin1) AS BINARY) USING utf8) != `last_name`
I get a 1300
error like this:
#1300 - Invalid utf8 character string: 'E36F'
Any idea why the update is different than the select? Any thoughts on how to get around this issue?