Isn't the size of character in Java 2 bytes?

I used RandomAccessFile to read a byte from a text file.

public static void readFile(RandomAccessFile fr) {
    byte[] cbuff = new byte[1];
    fr.read(cbuff,0,1);
    System.out.println(new String(cbuff));
}

Why am I seeing one full character being read by this?

标签： java string char

7条回答

狗以群分

2楼-- · 2019-01-03 12:37

Looks like your file contains ASCII characters, which are encoded in just 1 byte. If text file was containing non-ASCII character, e.g. 2-byte UTF-8, then you get just the first byte, not whole character.

0人赞添加讨论(0) 举报

劳资没心，怎么记你

3楼-- · 2019-01-03 12:39

In ASCII text file each character is just one byte

0人赞添加讨论(0) 举报

ら.Afraid

4楼-- · 2019-01-03 12:44

Java allocates 2 of 2 bytes for character as it follows UTF-16. It occupies minimum 2 bytes while storing a character, and maximum of 4 bytes. There is no 1 byte or 3 bytes of storage for character.

0人赞添加讨论(0) 举报

叛逆

5楼-- · 2019-01-03 12:45

There are some great answers here but I wanted to point out the jvm is free to store a char value in any size space >= 2 bytes.

On many architectures there is a penalty for performing unaligned memory access so a char might easily be padded to 4 bytes. A volatile char might even be padded to the size of the CPU cache line to prevent false sharing. https://en.wikipedia.org/wiki/False_sharing

It might be non-intuitive to new Java programmers that a character array or a string is NOT simply multiple characters. You should learn and think about strings and arrays distinctly from "multiple characters".

I also want to point out that java characters are often misused. People don't realize they are writing code that won't properly handle codepoints over 16 bits in length.

0人赞添加讨论(0) 举报

看我几分像从前

6楼-- · 2019-01-03 12:52

Java stores all it's "chars" internally as two bytes. However, when they become strings etc, the number of bytes will depend on your encoding.

Some characters (ASCII) are single byte, but many others are multi-byte.

Java supports Unicode, thus according to:

Java Character Docs

The max value supported is "\uFFFF" (hex FFFF, dec 65535), or 11111111 11111111 binary (two bytes).

0人赞添加讨论(0) 举报

聊天终结者

7楼-- · 2019-01-03 12:59

The constructor String(byte[] bytes) takes the bytes from the buffer and encodes them to characters.

It uses the platform default charset to encode bytes to characters. If you know, your file contains text, that is encoded in a different charset, you can use the String(byte[] bytes, String charsetName) to use the correct encoding (from bytes to characters).

0人赞添加讨论(0) 举报

1 2 下一页

Isn't the size of character in Java 2 bytes?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间