Convert MySql data from Latin1 to UTF8 [duplicate]

2019-09-07 09:02发布

This question already has an answer here:

This is a common question has been asked for many times. However I still cannot get the right answer from google.

In my web app, there is a form for collecting data, the app and all data is collecting in UTF-8. However, mistakenly, the collection of the schema and the table has been set as latin1. Moreover, during the connection, "SET NAMES UTF8" has been used.

Now some of the data in Chinese is always showing as questing mark(?), no matter what conversion method I use. Query problem columns as binary also shows the data is several bytes of 3f, meaning several '?'s.

If my data still be able to convert to utf-8 and shows correctly or already lost?

[UPDATE]

This is not the same question with How to convert an entire MySQL database characterset and collation to UTF-8? because I have done not just convert the entire database and table to UTF-8 but also mysqldump and re-import it into the database. However, none of them works.

[UPDATE 2]

The problem is not just about converting table charset but also need to understand UTF-8, Latin encoding system.

Basic knowledge is:

Latin use only 1 byte which 8 bits for storing.

UTF-8 use dynamic storing system which means MAY NOT just 1 byte

Since UTF-8 encoding system needs at least 1 bit for identification, that means only 7 bits could be used for storing compare with Latin. So, if characters just need 7 bits to store, it can successfully store in Latin with UTF-8 representation. However, if data exceed 7 bits, it will be broken.

So, such Chinese and Japaneses, it needs 2 to 3 bytes for storing, that will damage the data during storing process because the first byte in UTF-8 representation already exceed the range that Latin can store.

That's why no matter how I change the charset of both the database and the table it still shows '?', because in Latin, every character that out of the range will be presenting in '?', 3F in HEX.

1条回答
疯言疯语
2楼-- · 2019-09-07 09:26

Juste change the character set of the entire database:

ALTER DATABASE databasename CHARACTER SET utf8 COLLATE utf8_unicode_ci;

And of course you can do it for some table.

Further more have a look at the documentation here.

EDIT:

OtherWise, if you data are already sotred in "?" marks, the reality is that it is damaged.

查看更多
登录 后发表回答