Character Encoding Problem

2019-08-27 14:11发布

问题:

I know this sounds really silly but what character encoding should I use for something that looks like this in UTF-8

�� Ã�¼Ã��Ã�½Ã�±Ã�¼Ã�Â

The website is in English. This is something user generated content which is stored in the database that is utf_general_ci and displayed on the screen . I just want to display it properly. What do I have to do ?

OK this is what the original text was something like

I αм iиvisibłє łiкє αiя--- I αм αs iмρøяŧαиŧ αs øxygєи--- I αм łiviиg iи ŧЋє wøяłd øƒ мy dяєαмz I αм αłwαys ŧЋєяє ŧø Ћєłρ øŧЋєяz--- I αм busy buŧ иєvєя igиøяє αиy øиє I αм ŧЋє øиє wЋø cαяєz--- I łøvє ŧø sєє øŧЋєя łαugЋiиg I αм ŧЋє øиє wЋø bøяяøw øŧЋєяz søяяøw I αм ŧЋє øиє wЋøz иαugЋŧy buŧ иicє I αм łøsŧ iи мy ŧЋøugЋŧs--- I łøvє ŧø ŧαłк--- I łøvє ŧø sЋαяє--- I αм яєαdy ŧø gø αиy wЋєяє--- I łøvє ŧø ƒły buŧ døи’ŧ Ћαvє wiиgs— I wαиŧ ŧøø ŧøucЋ ŧЋє sкy łiмiŧs--- I αм єvił buŧ иøŧ dєvił--- I иєvєя ƒøłłøw αиy ŧяєиd--- I αм ƒuиłøviиg--- suм ŧiмє łøvє ŧø bє αłøиє--- I łøvє ŧø łivє---

回答1:

Using UTF-8 is just fine, but here is few checkpoints.

If you are using MySQL, set database/tables/fields collations in utf8_unicode_ci

and If you are using php, do mysql_query('SET NAMES utf8'); before query

and in HTML output use

<meta http-equiv="content-type" content="text/html; charset=utf-8" />


回答2:

It might be more than a problem of choosing a display character set. That string unfortunately has a lot of replacement characters (�), which indicates that it's already gone through a process where characters have been lost because the incoming encoding wasn't understood. Even the fragment "�" is probably the replacement character in utf8 viewed through a single-byte encoding.

To check the quality of the information in the database, can you append the output of say select charset(colname), hex(left(colname, 20)) to the question?



回答3:

Just keep it in UTF-8.



回答4:

Users on you site could be entering characters in non-UTF-8, like big-5 or JIS. This is a problem: you need to either enforce that they're entering in UTF8, or somehow detect the character set they've used and then convert it to UTF8. Every locale has a default character set - for example if a user tells you that they should have a japanese interface it's likely they're using something like JIS, and you might be able to convert JIS->utf-8 on the way in, and then utf-8 to JIS on the way out. If you can't convert, just make sure you write utf-8 directive into your page's meta tag (if your interface is HTML), and enforce that only utf-8 characters make it into your database.



回答5:

You may want to use following conversion functions for utf-handling:

utf8_decode
utf8_encode
iconv