System.Text.Encoding.UTF8.GetBytes Extra Byte

2019-08-07 09:28发布

Why does this line

System.Text.Encoding.UTF8.GetBytes("ABCD±ABCD")

Give me back 10 bytes instead of 9? Although ± is char(177)

Is there a .Net function / encoding that will translate this string correctly into 9 bytes?

标签： c# .net vb.net utf-8 character-encoding

4条回答

【Aperson】

2楼-- · 2019-08-07 09:46

Although ± is char(177)

And the UTF-8 encoding for that is 0xc2 0xb1 - two bytes. Basically, every code-point >= 128 will take multiple bytes - where the number of bytes depends on the magnitude of the code-point.

That data is 10 bytes, when encoded with UTF-8. The error here is your expectation that it should take 9.

0人赞添加讨论(0) 举报

Summer. ? 凉城

3楼-- · 2019-08-07 09:47

You should use Windows-1251 encoding to get ± as 177

var bytes = System.Text.Encoding.GetEncoding("Windows-1251").GetBytes("ABCD±ABCD");

0人赞添加讨论(0) 举报

\"骚年 ilove

4楼-- · 2019-08-07 10:03

± falls out side of the range of ASCII so it is represented by 2 bytes.

0人赞添加讨论(0) 举报

我命由我不由天

5楼-- · 2019-08-07 10:03

This video explains utf-8 encoding nicely: http://www.youtube.com/watch?v=MijmeoH9LT4. After watching it you will realize why it results in more bytes and you thought.

0人赞添加讨论(0) 举报

System.Text.Encoding.UTF8.GetBytes Extra Byte

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间