I have ended up with messed up character encodings in one of our mysql columns.
Typically I have
√© instead of é
√∂ instead of ö
√≠ instead of í
and so on...
Fairly certain that someone here would know what happened and how to fix.
UPDATE: Based on bobince's answer and since I had this data in a file I did the following
#!/user/bin/env python
import codecs
f = codecs.open('./file.csv', 'r', 'utf-8')
f2 = codecs.open('./file-fixed.csv', 'w', 'utf-8')
for line in f:
f2.write(line.encode('macroman').decode('utf-8')),
after which
load data infile 'file-fixed.csv'
into table list1
fields terminated by ','
optionally enclosed by '"'
ignore 1 lines;
properly imported the data.
UPDATE2: Hammerite, just for completeness here are the requested details...
mysql> SHOW VARIABLES LIKE 'character_set%';
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | latin1 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
The SHOW CREATE TABLE
for the table I am importing to has DEFAULT CHARSET=utf8
EDIT3:
Actually with the above settings the load
didn't do the right thing (I could not compare to existing utf8 fields and my loaded data only looked as if it was loaded correctly; I assume because of the wrong, but matching client, connection and results charsets), so I updated the settings to:
+--------------------------+----------------------------+
| Variable_name | Value |
+--------------------------+----------------------------+
| character_set_client | utf8 |
| character_set_connection | utf8 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | utf8 |
| character_set_server | utf8 |
| character_set_system | utf8 |
| character_sets_dir | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
uploaded data again and then finally I got the data loaded correctly (comparable with existing data).