MySQL C# Text Encoding Problems

2020-01-28 09:08发布

问题:

I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.

The string that I retrieve from it have strange characters when ë like characters are expected.

For example: "ë" is "ë".

I thought I could get this right by converting from UTF8 to UTF16.

 return Encoding.Unicode.GetString(                
            Encoding.Convert(
            Encoding.UTF8,
            Encoding.Unicode,
            Encoding.UTF8.GetBytes(utf8)));
    }

This however doesn't change a thing.

How could I get the data from this database in proper form?

回答1:

There are two things that you need to do to support UTF-8 in the ADO.NET Entity frame work (or in general using the MySQL .NET Connector):

  1. Ensure that the collation of your database of table is a UTF-8 collation (i.e. utf8_general_ci or one of its relations)
  2. Add Charset=utf8; to your connection string.

    "Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"
    

I'm not certain, but the encoding may be case sensitive; I found that CharSet=UTF8; did not work for me.



回答2:

Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:

  1. Ensure you are using a Unicode field type like NVARCHAR or TEXT CHARSET utf8
  2. Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
  3. Whenever you select based on Unicode data ensure you use the N prefix again

MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')");

MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");

If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set



回答3:

Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.



回答4:

As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.

What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "ë" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything. If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of à and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.



回答5:

thank you The Mouth of a Cow , your solution works but still we need converting characters. i think this is your problem :) and for converting characters you can use this code

 System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;

 string s = "unicode";

 //string to utf
 byte[] utf = System.Text.Encoding.UTF8.GetBytes(s);

 //utf to string
 string s2= System.Text.Encoding.UTF8.GetString(utf);