I need to convert a string to UTF-8 in C#. I've already try many ways but none works as I wanted. I converted my string into a byte array and then to try to write it to an XML file (which encoding is UTF-8....) but either I got the same string (not encoded at all) either I got a list of byte which is useless.... Does someone face the same issue ?
Edit : This is some of the code I used :
str= "testé";
byte[] utf8Bytes = Encoding.UTF8.GetBytes(str);
return Encoding.UTF8.GetString(utf8Bytes);
The result is "testé" or I expected something like "testé"...
In the original post author concatenated strings. Every sting operation will result in string recreation in .Net. String is effectively a reference type. As a result, the function provided will be visibly slow. Don't do that. Use array of chars instead, write there directly and then convert result to string. In my case of processing 500 kb of text difference is almost 5 minutes.
A string in C# is always UTF-16, there is no way to "convert" it. The encoding is irrelevant as long as you manipulate the string in memory, it only matters if you write the string to a stream (file, memory stream, network stream...).
If you want to write the string to a XML file, just specify the encoding when you create the
XmlWriter
If you want a UTF8 string, where every byte is correct ('Ö' -> [195, 0] , [150, 0]), you can use the followed:
In my case the DLL request is a UTF8 string too, but unfortunately the UTF8 string must be interpreted with UTF16 encoding ('Ö' -> [195, 0], [19, 32]). So the ANSI '–' which is 150 has to be converted to the UTF16 '–' which is 8211. If you have this case too, you can use the following instead:
Or the Native-Method:
If you need it the other way around, see Utf8ToUtf16. Hope I could be of help.
does this example help ?
}
Check the Jon Skeet answer to this other question: UTF-16 to UTF-8 conversion (for scripting in Windows)
It contains the source code that you need.
Hope it helps.