I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.
The string that I retrieve from it have strange characters when ë like characters are expected.
For example: "ë" is "ë".
I thought I could get this right by converting from UTF8 to UTF16.
return Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
}
This however doesn't change a thing.
How could I get the data from this database in proper form?
There are two things that you need to do to support UTF-8 in the ADO.NET Entity frame work (or in general using the MySQL .NET Connector):
- Ensure that the collation of your database of table is a UTF-8 collation (i.e.
utf8_general_ci
or one of its relations)
Add Charset=utf8;
to your connection string.
"Server=localhost;Database=test;Uid=test;Pwd=test;Charset=utf8;"
I'm not certain, but the encoding may be case sensitive; I found that CharSet=UTF8;
did not work for me.
Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:
- Ensure you are using a Unicode field type like NVARCHAR or TEXT CHARSET utf8
- Whenever you insert anything into the field you must prefix it with the N character to indicate Unicode data as shown in the examples below
- Whenever you select based on Unicode data ensure you use the N prefix again
MySqlCommand cmd = new MySqlCommand("INSERT INTO EXAMPLE (someField) VALUES (N'Unicode Data')");
MySqlCommand cmd2 = new MySqlCommand("SELECT * FROM EXAMPLE WHERE someField=N'Unicode Data'");
If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set
Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.
As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.
What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "ë" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything.
If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of à and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.
thank you The Mouth of a Cow ,
your solution works but still we need converting characters.
i think this is your problem :)
and for converting characters you can use this code
System.Text.Encoding utf_8 = System.Text.Encoding.UTF8;
string s = "unicode";
//string to utf
byte[] utf = System.Text.Encoding.UTF8.GetBytes(s);
//utf to string
string s2= System.Text.Encoding.UTF8.GetString(utf);