I have an old MySQL database with encoding set to UTF-8. I am using Ado.Net Entity framework to connect to it.
The string that I retrieve from it have strange characters when ë like characters are expected.
For example: "ë" is "ë".
I thought I could get this right by converting from UTF8 to UTF16.
return Encoding.Unicode.GetString(
Encoding.Convert(
Encoding.UTF8,
Encoding.Unicode,
Encoding.UTF8.GetBytes(utf8)));
}
This however doesn't change a thing.
How could I get the data from this database in proper form?
There are two things that you need to do to support UTF-8 in the ADO.NET Entity frame work (or in general using the MySQL .NET Connector):
utf8_general_ci
or one of its relations)Add
Charset=utf8;
to your connection string.I'm not certain, but the encoding may be case sensitive; I found that
CharSet=UTF8;
did not work for me.thank you The Mouth of a Cow , your solution works but still we need converting characters. i think this is your problem :) and for converting characters you can use this code
Try set the encoding by "set names utf8" query. You can set this parameter in mysql config too.
As others have said this could be a db issue, but it could also be caused by using an old version of the .net mysql connector.
What I actually wanted to comment on was the utf8 to utf16 conversion. The string you are trying to convert is actually alreay unicode encoded, so your "ë" characters actually takes up 4 bytes (or more) and are no longer, at the point of your conversion, a misrepresentation of the "ë" character. That is the reason why your conversion doesn't do anything. If you want to do a conversion like that I think you would have to encode your utf8 string as a old style 1 byte per character string, using a codepage where the byte values of à and « actually represent the utf8 byte sequence of ë and then treat the bytes of this new string as an utf8 string. Fun stuff.
Even if the database is set to UTF8 you must do the following things to get Unicode fields to work correctly:
If the database wasn't configured correctly or the data was inserted without using the N prefix it won't be possible to get the correct data out since it will have been downcast into the Latin 1/ASCII character set