Java: why “\uFFFF” converts to [-17, -65, -65] in

2019-05-29 06:49发布

Why does "\uFFFF" (which is apparently 2 bytes long) convert to [-17,-65,-65] in UTF-8 and not [-1,-1]?

System.out.println(Arrays.toString("\uFFFF".getBytes(StandardCharsets.UTF_8)));

Is this because UTF-8 uses only 6 bits in every byte for codepoints larger than 127?

标签： java unicode utf-8 character-encoding

2条回答

兄弟一词,经得起流年.

2楼-- · 2019-05-29 07:44

UTF-8 uses a different amount of bytes depending on the character being represented. The first byte uses the 7 bit ASCII convention for backwards compatibility. Other characters (like chinese signs) can take up to 4 bytes.

As the linked article in wikipedia states, the character you referenced is in the range of the 3 byte values.

0人赞添加讨论(0) 举报

姐就是有狂的资本

3楼-- · 2019-05-29 07:47

0xFFFF has a bit pattern of 11111111 11111111. Divide up the bits according to UTF-8 rules and the pattern becomes 1111 111111 111111. Now add UTF-8's prefix bits and the pattern becomes *1110*1111 *10*111111 *10*111111, which is 0xEF 0xBF 0xBF, aka 239 191 191, aka -17 -65 -65 in twos complement format (which is what Java uses for signed values - Java does not have unsigned data types).

0人赞添加讨论(0) 举报

Java: why “\uFFFF” converts to [-17, -65, -65] in

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间