Char into byte? (Java)

2019-03-09 21:42发布

问题:

How come this happens:

char a = '\uffff'; //Highest value that char can take - 65535
byte b = (byte)a; //Casting a 16-bit value into 8-bit data type...! Isn't data lost here?
char c = (char)b; //Let's get the value back
int d = (int)c;
System.out.println(d); //65535... how?

Basically, I saw that a char is 16-bit. Therefore, if you cast it into a byte, how come no data is lost? (Value is the same after casting into an int)

Thanks in advance for answering this little ignorant question of mine. :P

EDIT: Woah, found out that my original output actually did as expected, but I just updated the code above. Basically, a character is cast into a byte and then cast back into a char, and its original, 2-byte value is retained. How does this happen?

回答1:

As trojanfoe states, your confusion on the results of your code is partly due to sign-extension. I'll try to add a more detailed explanation that may help with your confusion.

char a = '\uffff';
byte b = (byte)a;  // b = 0xFF

As you noted, this DOES result in the loss of information. This is considered a narrowing conversion. Converting a char to a byte "simply discards all but the n lowest order bits".
The result is: 0xFFFF -> 0xFF

char c = (char)b;  // c = 0xFFFF

Converting a byte to a char is considered a special conversion. It actually performs TWO conversions. First, the byte is SIGN-extended (the new high order bits are copied from the old sign bit) to an int (a normal widening conversion). Second, the int is converted to a char with a narrowing conversion.
The result is: 0xFF -> 0xFFFFFFFF -> 0xFFFF

int d = (int)c;  // d = 0x0000FFFF

Converting a char to an int is considered a widening conversion. When a char type is widened to an integral type, it is ZERO-extended (the new high order bits are set to 0).
The result is: 0xFFFF -> 0x0000FFFF. When printed, this will give you 65535.

The three links I provided are the official Java Language Specification details on primitive type conversions. I HIGHLY recommend you take a look. They are not terribly verbose (and in this case relatively straightforward). It details exactly what java will do behind the scenes with type conversions. This is a common area of misunderstanding for many developers. Post a comment if you are still confused with any step.



回答2:

It's sign extension. Try \u1234 instead of \uffff and see what happens.



回答3:

java byte is signed. it's counter intuitive. in almost all situations where a byte is used, programmers would want an unsigned byte instead. it's extremely likely a bug if a byte is cast to int directly.

This does the intended conversion correctly in almost all programs:

int c = 0xff & b ;

Empirically, the choice of signed byte is a mistake.



回答4:

Some rather strange stuff going on your machine. Take a look at Java language specification, chapter 4.2.1:

The values of the integral types are integers in the following ranges:

For byte, from -128 to 127, inclusive

... snip others...

If your JVM is standards compliant, then your output should be -1.