Is Encoding.Unicode
just a name for UTF-16? Then why is it called just Unicode instead of UTF16?
In the encoding documentation Microsoft states that for most scenarios and applications you should avoid using Encoding.ASCII
and Encoding.Default
.
When using System.Text.Encoding
. In most scenarios should I be using Encoding.Unicode
or Encoding.UTF8
?
It comes from the early days of Unicode. Unicode 1.0 was a 16 bit encoding as it was assumed that 65536 code points would be sufficient. Unicode 2.0 abandoned this restriction, however early adopters of Unicode, including Microsoft, Named their encoding Unicode and it has stuck.
Nowadays you should be using UTF-8 unless you have a specific, eg legacy software you need to integrate with, reason to do so.
The reason for this is that ASCII is binary compatible with UTF-8, and there is a lot of ASCII out there
Yes. Specifically, for little endian UTF-16.
Encoding
has a separateBigEndianUnicode
property for big endian UTF-16.For historical reasons. Microsoft was one of the 1st companies to adopt Unicode, so it had a "Unicode" implementation in Windows way back in the early days of Unicode before UTF-16 was invented. "Unicode" is Microsoft's de-facto name to refer to whatever its native Unicode encoding is, which used to be UCS-2 and is now UTF-16.
That really depends on your particular scenarios. Use whichever encoding suits your needs. Both encodings have strengths and weaknesses.
UTF-8 is commonly used for interoperability in communications protocols, as it doesn't suffer from endian problems, and is largely compatible with most existing textual based protocols. It is also usually smaller for byte storage than UTF-16 for most languages.
UTF-16 is usually easier to process in memory than UTF-8, which is why so many libraries and frameworks use it for Strings. And it can be smaller for byte storage than UTF-8, especially for Eastern Asian languages.