Encoding.UTF8 or Encoding.Unicode?

Is Encoding.Unicode just a name for UTF-16? Then why is it called just Unicode instead of UTF16?

In the encoding documentation Microsoft states that for most scenarios and applications you should avoid using Encoding.ASCII and Encoding.Default.

When using System.Text.Encoding. In most scenarios should I be using Encoding.Unicode or Encoding.UTF8?

标签： .net unicode encoding utf-8

2条回答

唯我独甜

2楼-- · 2019-08-27 04:47

It comes from the early days of Unicode. Unicode 1.0 was a 16 bit encoding as it was assumed that 65536 code points would be sufficient. Unicode 2.0 abandoned this restriction, however early adopters of Unicode, including Microsoft, Named their encoding Unicode and it has stuck.

Nowadays you should be using UTF-8 unless you have a specific, eg legacy software you need to integrate with, reason to do so.

The reason for this is that ASCII is binary compatible with UTF-8, and there is a lot of ASCII out there

0人赞添加讨论(0) 举报

混吃等死

3楼-- · 2019-08-27 04:47

Is Encoding.Unicode just a name for UTF-16?

Yes. Specifically, for little endian UTF-16. Encoding has a separate BigEndianUnicode property for big endian UTF-16.

Then why is it called just Unicode instead of UTF16?

For historical reasons. Microsoft was one of the 1st companies to adopt Unicode, so it had a "Unicode" implementation in Windows way back in the early days of Unicode before UTF-16 was invented. "Unicode" is Microsoft's de-facto name to refer to whatever its native Unicode encoding is, which used to be UCS-2 and is now UTF-16.

When using System.Text.Encoding. In most scenarios should I be using Encoding.Unicode or Encoding.UTF8?

That really depends on your particular scenarios. Use whichever encoding suits your needs. Both encodings have strengths and weaknesses.

UTF-8 is commonly used for interoperability in communications protocols, as it doesn't suffer from endian problems, and is largely compatible with most existing textual based protocols. It is also usually smaller for byte storage than UTF-16 for most languages.

UTF-16 is usually easier to process in memory than UTF-8, which is why so many libraries and frameworks use it for Strings. And it can be smaller for byte storage than UTF-8, especially for Eastern Asian languages.

0人赞添加讨论(0) 举报

Encoding.UTF8 or Encoding.Unicode?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间