Char size 8 bit or 16 bit?

2020-07-02 12:39发布

问题:

http://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html, char size is 16 bit i.e 2 byte. somehow i recalled its 8 bit i.e 1 byte. To clear my doubt, i created a text file with single character "a" and saved it. Then i inspected the size of file , its 1 byte i.e 8 bit. I am confused whats the size of character ? If its 2 byte , why file size is 1 byte and if it is 1 byte why link says 2 bytes?

回答1:

A char in Java is a UTF-16 code unit. It's not necessarily a complete Unicode character, but it's effectively an unsigned 16-bit integer.

When you write text to a file (or in some other way convert it into a sequence of bytes), then the data will depend on which encoding you use. For example, if you use ASCII or ISO-8859-1 then you're very limited as to which characters you can write, but each character will only be a byte. If you use UTF-16, then each Java char will be converted into exactly two bytes - but some Unicode characters may take four bytes (those represented by two Java char values).

If you use UTF-8, then the length of even a single Java char in the encoded form will depend on the value.



回答2:

There is a contemporary way to learn its size. Just print with BYTES.

System.out.println(Character.BYTES);

It results in 2



回答3:

Note that text files really have a format/ character set associated with them. Text files will normally be saved in UTF-8 format which is 8 bits per character unless the character is "special".



回答4:

A char in Java is 2 bytes large (as the valid value range suggests). But it doesn't necessarily mean that every representation of a character is 2 bytes long. For instance, many encodings would only reserve 1 byte for every character (or use 1 byte for the most frequent characters).If the platform default encoding is a 1-byte encoding such as ISO-8859-1 or a variable-length encoding such as UTF-8, it can easily convert that 1 byte to a single character.



标签: java char byte