我得和一群分散在多个表坏UTF8字符的数据库。 人物的名单是不是很广泛AFAIK(áéíúóÁÉÍÓÚÑñ)
固定给定的表是非常简单的
update orderItem set itemName=replace(itemName,'á','á');
但我不能得到检测断字的方式。 如果我这样做
SELECT * FROM TABLE WHERE field LIKE "%Ã%";
我得到的,因为核对(A = A)的几乎所有领域。 所有破碎的字符到目前为止开始以“A”。 该数据库是在西班牙,因此不使用该特定字符
到目前为止,我已经得到了打破字符的列表
á = á
é = é
Ã- = í
ó = ó
ñ = ñ
á = Á
如何任何想法做出选择此项工作打算? (二进制搜索或类似的东西)
Answer 1:
如何采用不同的方法,即列来回转换,以获得正确的字符集? 你可以把它转换为二进制,然后为UTF-8,然后为ISO-8859-1或其他任何你正在使用。 请参阅手册的细节。
Answer 2:
我用固定
UPDATE wp_zcs9ck_posts_copy SET post_title =
CONVERT(BINARY CONVERT(post_title USING latin1) USING utf8);
完整的解决方案: http://jonisalonen.com/2012/fixing-doubly-utf-8-encoded-text-in-mysql/
Answer 3:
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
Answer 4:
无文本替换是一个通用的解决方案,因为你可以忘记一些字符。 对于双转换后的文字更合适的解决方法是:
- 转换回LATIN1
- 转换为二进制
- 转换为UTF8
像这样:
alter table descriptions modify name VARCHAR(2000) character set latin1;
alter table descriptions modify name blob;
alter table descriptions modify name VARCHAR(2000) character set utf8;
Answer 5:
谢谢您的回答!
我修好了我这个表,并希望分享变化的完整列表。 请注意,它也包括固定的HTML解码后的字符,除了拉丁美洲的人,这真是一个烂摊子:
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'“','"');
update `table` set `field` = replace(`field` ,'â€','"');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
//Edit by slash4
update `table` set `field` = replace(`field` ,'Ã ','À');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'•','-');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'õ','õ');
-- The next one appears to be missing a character. But which one?
update `table` set `field` = replace(`field` ,'Ã','í');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'Ã ','à');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'á','á');
update `table` set `field` = replace(`field` ,'â','â');
update `table` set `field` = replace(`field` ,'é','é');
update `table` set `field` = replace(`field` ,'í','í');
update `table` set `field` = replace(`field` ,'õ','õ');
update `table` set `field` = replace(`field` ,'ú','ú');
update `table` set `field` = replace(`field` ,'ç','ç');
update `table` set `field` = replace(`field` ,'Á','Á');
update `table` set `field` = replace(`field` ,'Â','Â');
update `table` set `field` = replace(`field` ,'É','É');
update `table` set `field` = replace(`field` ,'Í','Í');
update `table` set `field` = replace(`field` ,'Õ','Õ');
update `table` set `field` = replace(`field` ,'Ú','Ú');
update `table` set `field` = replace(`field` ,'Ç','Ç');
update `table` set `field` = replace(`field` ,'Ã','Ã');
update `table` set `field` = replace(`field` ,'À','À');
update `table` set `field` = replace(`field` ,'Ê','Ê');
update `table` set `field` = replace(`field` ,'Ó','Ó');
update `table` set `field` = replace(`field` ,'Ô','Ô');
update `table` set `field` = replace(`field` ,'Ü','Ü');
update `table` set `field` = replace(`field` ,'ã','ã');
update `table` set `field` = replace(`field` ,'à','à');
update `table` set `field` = replace(`field` ,'ê','ê');
update `table` set `field` = replace(`field` ,'ó','ó');
update `table` set `field` = replace(`field` ,'ô','ô');
update `table` set `field` = replace(`field` ,'ü','ü');
update `table` set `field` = replace(`field` ,'&','&');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'ˆ','ˆ');
update `table` set `field` = replace(`field` ,'˜','˜');
update `table` set `field` = replace(`field` ,'¨','¨');
update `table` set `field` = replace(`field` ,'&cute;','´');
update `table` set `field` = replace(`field` ,'¸','¸');
update `table` set `field` = replace(`field` ,'"','"');
update `table` set `field` = replace(`field` ,'“','“');
update `table` set `field` = replace(`field` ,'”','”');
update `table` set `field` = replace(`field` ,'‘','‘');
update `table` set `field` = replace(`field` ,'’','’');
update `table` set `field` = replace(`field` ,'‹','‹');
update `table` set `field` = replace(`field` ,'›','›');
update `table` set `field` = replace(`field` ,'«','«');
update `table` set `field` = replace(`field` ,'»','»');
update `table` set `field` = replace(`field` ,'º','º');
update `table` set `field` = replace(`field` ,'ª','ª');
update `table` set `field` = replace(`field` ,'–','–');
update `table` set `field` = replace(`field` ,'—','—');
update `table` set `field` = replace(`field` ,'¯','¯');
update `table` set `field` = replace(`field` ,'…','…');
update `table` set `field` = replace(`field` ,'¦','¦');
update `table` set `field` = replace(`field` ,'•','•');
update `table` set `field` = replace(`field` ,'¶','¶');
update `table` set `field` = replace(`field` ,'§','§');
update `table` set `field` = replace(`field` ,'¹','¹');
update `table` set `field` = replace(`field` ,'²','²');
update `table` set `field` = replace(`field` ,'³','³');
update `table` set `field` = replace(`field` ,'½','½');
update `table` set `field` = replace(`field` ,'¼','¼');
update `table` set `field` = replace(`field` ,'¾','¾');
update `table` set `field` = replace(`field` ,'⅛','⅛');
update `table` set `field` = replace(`field` ,'⅜','⅜');
update `table` set `field` = replace(`field` ,'⅝','⅝');
update `table` set `field` = replace(`field` ,'⅞','⅞');
update `table` set `field` = replace(`field` ,'>','>');
update `table` set `field` = replace(`field` ,'<','<');
update `table` set `field` = replace(`field` ,'±','±');
update `table` set `field` = replace(`field` ,'−','−');
update `table` set `field` = replace(`field` ,'×','×');
update `table` set `field` = replace(`field` ,'÷','÷');
update `table` set `field` = replace(`field` ,'∗','∗');
update `table` set `field` = replace(`field` ,'⁄','⁄');
update `table` set `field` = replace(`field` ,'‰','‰');
update `table` set `field` = replace(`field` ,'∫','∫');
update `table` set `field` = replace(`field` ,'∑','∑');
update `table` set `field` = replace(`field` ,'∏','∏');
update `table` set `field` = replace(`field` ,'√','√');
update `table` set `field` = replace(`field` ,'∞','∞');
update `table` set `field` = replace(`field` ,'≈','≈');
update `table` set `field` = replace(`field` ,'≅','≅');
update `table` set `field` = replace(`field` ,'∝','∝');
update `table` set `field` = replace(`field` ,'≡','≡');
update `table` set `field` = replace(`field` ,'≠','≠');
update `table` set `field` = replace(`field` ,'≤','≤');
update `table` set `field` = replace(`field` ,'≥','≥');
update `table` set `field` = replace(`field` ,'∴','∴');
update `table` set `field` = replace(`field` ,'⋅','⋅');
update `table` set `field` = replace(`field` ,'·','·');
update `table` set `field` = replace(`field` ,'∂','∂');
update `table` set `field` = replace(`field` ,'ℑ','ℑ');
update `table` set `field` = replace(`field` ,'ℜ','ℜ');
update `table` set `field` = replace(`field` ,'′','′');
update `table` set `field` = replace(`field` ,'″','″');
update `table` set `field` = replace(`field` ,'°','°');
update `table` set `field` = replace(`field` ,'∠','∠');
update `table` set `field` = replace(`field` ,'⊥','⊥');
update `table` set `field` = replace(`field` ,'∇','∇');
update `table` set `field` = replace(`field` ,'⊕','⊕');
update `table` set `field` = replace(`field` ,'⊗','⊗');
update `table` set `field` = replace(`field` ,'ℵ','ℵ');
update `table` set `field` = replace(`field` ,'ø','ø');
update `table` set `field` = replace(`field` ,'Ø','Ø');
update `table` set `field` = replace(`field` ,'∈','∈');
update `table` set `field` = replace(`field` ,'∉','∉');
update `table` set `field` = replace(`field` ,'∩','∩');
update `table` set `field` = replace(`field` ,'∪','∪');
update `table` set `field` = replace(`field` ,'⊂','⊂');
update `table` set `field` = replace(`field` ,'⊃','⊃');
update `table` set `field` = replace(`field` ,'⊆','⊆');
update `table` set `field` = replace(`field` ,'⊇','⊇');
update `table` set `field` = replace(`field` ,'∃','∃');
update `table` set `field` = replace(`field` ,'∀','∀');
update `table` set `field` = replace(`field` ,'∅','∅');
update `table` set `field` = replace(`field` ,'¬','¬');
update `table` set `field` = replace(`field` ,'∧','∧');
update `table` set `field` = replace(`field` ,'∨','∨');
update `table` set `field` = replace(`field` ,'↵','↵');
Answer 6:
该SELECT
需要声明如下:
SELECT * FROM TABLE WHERE LENGTH(name) != CHAR_LENGTH(name);
这将返回包含多字节字符的所有行。
name
被认为是一个字段/哪里奇怪的字符会被发现场。 *
Answer 7:
这救了我的命
UPDATE ohp_posts SET post_content = CONVERT(CAST(CONVERT(post_content USING latin1) AS BINARY) USING utf8)
我发现在这里http://stanis.net/2014/04/replacing-latin-1-with-utf-8-characters-in-mysql/
Answer 8:
我有同样的问题,但并没有像更换()解决方案,因为总有一些丢失字符的可能性。 我正在对混合数据的列(一些已经函数utf8_encode()d和一些不)400万行左右,约25万条记录与错误编码数据(与‰/等字符),占地约15种国际语言,主要包括欧洲语言,而且俄罗斯,日本和中国。
我开始通过复制列,因为我不想丢失任何数据:
ALTER TABLE images ADD COLUMN reptitle TEXT;
复制的所有具有多字节字符(感谢亚当的尖端)的数据
UPDATE images SET reptitle = title WHERE LENGTH(title) != CHAR_LENGTH(title)
由于reptitle与表的默认字符集创建它已经是UTF8,但包含损坏的数据,因为图像表曾经是一个拉丁来源。 列reptitle现在包含一些数据是正确编码,以及一些损坏的(所有带多字节字符值,一些已正确函数utf8_encode()d。所以后来与大卫的提示...
ALTER TABLE images MODIFY reptitle TEXT character set latin1;
ALTER TABLE images MODIFY reptitle BLOB;
ALTER TABLE images MODIFY reptitle TEXT character set utf8;
因为TEXT和BLOB(我认为)是相同的中间步骤可不能是必要的。 这不得不修正所有错误编码数据(成为“étudiantes”“A©tudiantes”等)(成为“拉平去P”“落聘德帕克”)的影响,但是这在以前是正确的,在第一个多字节字符被截断的数据。 我不知道为什么截断,但它在一次性柱,所以我也没在意。 截掉的数据给出了CHAR_LENGTH和相同的价值观的长度,因为没有剩余的那么容易查询多字节字符...
UPDATE images SET title = reptitle WHERE LENGTH(reptitle)!=CHAR_LENGTH(reptitle)
然后,当然刚落备用列
ALTER TABLE images DROP COLUMN reptitle
另外,还要确保(因为我使用PHP,这绊倒了我几次,所以我想我会在这里提到它),你的所有脚本文件是UTF8(无BOM),并且使用:
mysql_set_charset('utf8', $connection);
等瞧...完美修复的数据,所有的语言:)
Answer 9:
除了劳尔·阿维拉索拉诺和acseven的答案,如果你要更新一个查询就可以完成所有的碎字符 :
update `table` set field = replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(replace(field,'ü','ü'),'ô','ô'),'ó','ó'),'ê','ê'),'à','à'),'ã','ã'),'Ü','Ü'),'Ô','Ô'),'Ó','Ó'),'Ê','Ê'),'À','À'),'Ã','Ã'),'Ç','Ç'),'Ú','Ú'),'Õ','Õ'),'Í','Í'),'Í','Í'),'É','É'),'Â','Â'),'Á','Á'),'ç','ç'),'ú','ú'),'õ','õ'),'í','í'),'é','é'),'â','â'),'á','á'),'ã','ã'),'ç','ç'),'à ','à'),'à ','à'),'º','º'),'ª','ª'),'ç','ç'),'–','–'),'ó','ó'),'é','é'),'á','á'),'ê','ê'),'ã','ã'),'â','â'),'Ã','í'),'õ','õ'),'Ø','Ø'),'•','-'),'ú','ú'),'à ','À'),'Ã','Ã'),'Ç','Ç'),'â€','"'),'“','"'),'É','É');
Answer 10:
这也解决了我的问题对一些意大利字符
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'á','á');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ä','ä');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'é','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í©','é');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ó','ó');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íº','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ú','ú');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'ñ','ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í‘','Ñ');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Ã','í');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'–','–');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'’','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'…','...');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'–','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'“','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'â€','"');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‘','\'');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'•','-');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name`,'‡','c');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'Â','');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í ','à');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¨','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'íˆ','È');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'€','€');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'eÌ€','è');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í²','ò');
UPDATE `table_name` SET `column_name` = REPLACE(`column_name` ,'í¹','ù');
Answer 11:
你可能有行与正确编码UTF8与编码错误的字符。 在这种情况下,“转换(二进制格式转换(POST_TITLE使用LATIN1)用UTF8)”将削减一些领域。
最后我做这种方式
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ä" USING latin1),'ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ö" USING latin1),'ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ü" USING latin1),'ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ä" USING latin1),'Ä');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ö" USING latin1),'Ö');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "Ü" USING latin1),'Ü');
update `table` set `name` = replace(`name` ,CONVERT(BINARY "ß" USING latin1),'ß');
Answer 12:
基于在这个岗位数据https://www.i18nqa.com/debug/utf8-debug.html我建议这是识别狡猾条目和可能正确的值的一个很好的查询:
SELECT my_field,CONVERT(BINARY CONVERT(my_field USING latin1) USING utf8mb4) AS new_field_value FROM my_table WHERE my_field REGEXP '[âÆËÅÂÃ]';
要非常小心,因为我们有一个文件名的坏编码,但路径的确定编码,并且在这种情况下,一些解决方案上面会造成痛苦的世界。 如果你的一些数据已经在正确的utf-8编码,你可能会发现你失去的是一大块。
Answer 13:
由于中间步骤可能没有必要TEXT
和BLOB
是相同的。
这不得不校正所有错误的编码数据,但是这在以前是正确的第一多字节字符被截断数据的效果。
Answer 14:
有一个很好的脚本来自动转换过程在整个数据库。 这也是有用的知道,MySQL的UTF-8的实现是不完整的,因为它仅支持UTF-8字符最多3个字节。 该解决方案是使用在MySQL 5.5.3推出了utf8mb4字符集。
Answer 15:
这是@Thales Ceolin的,以修改每个表在DB答案的扩展:
select concat(
"update ",
a.TABLE_NAME,
" set ", b.COLUMN_NAME,
" = CONVERT(BINARY CONVERT(",
b.COLUMN_NAME,
" USING latin1) USING utf8) where ",
b.COLUMN_NAME,
" is not null;") query
from INFORMATION_SCHEMA.TABLES a
left join INFORMATION_SCHEMA.COLUMNS b on a.TABLE_NAME = b.TABLE_NAME
where a.table_schema = 'db_name'
and a.TABLE_TYPE = 'BASE TABLE'
and b.data_type in ('text', 'varchar')
and a.TABLE_NAME = 'table_name';
这将导致:
update table_name set idn = CONVERT(BINARY CONVERT(idn USING latin1) USING utf8) where idn is not null;
update table_nameset name = CONVERT(BINARY CONVERT(name USING latin1) USING utf8) where name is not null;
update table_name set primary_last_name = CONVERT(BINARY CONVERT(primary_last_name USING latin1) USING utf8) where primary_last_name is not null;
Answer 16:
作为主要的问题是在检测到断裂字符我的解决方案:(以防止在正常的charset双编码)
- 检测(LATIN1为utf8)
SELECT name FROM %table%
WHERE
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);
- 更新(LATIN1为utf8)
UPDATE %table% SET name = convert(cast(convert(name using latin1 ) as binary) using utf8 )
WHERE
CONVERT(CONVERT(name USING BINARY) USING utf8 ) != CONVERT(CONVERT(CONVERT(CONVERT(name USING BINARY) USING latin1) USING BINARY) USING utf8);
文章来源: Detecting utf8 broken characters in MySQL