When an int is cast to a short and truncated, how

2019-01-11 22:09发布

问题:

Can someone clarify what happens when an integer is cast to a short in C? I'm using Raspberry Pi, so I'm aware that an int is 32 bits, and therefore a short must be 16 bits.

Let's say I use the following C code for example:

int x = 0x1248642;
short sx = (short)x;
int y = sx;

I get that x would be truncated, but can someone explain how exactly? Are shifts used? How exactly is a number truncated from 32 bits to 16 bits?

回答1:

According to the ISO C standard, when you convert an integer to a signed type, and the value is outside the range of the target type, the result is implementation-defined. (Or an implementation-defined signal can be raised, but I don't know of any compilers that do this.)

In practice, the most common behavior is that the high-order bits are discarded. So assuming int is 32 bits and short is 16 bits, converting the value 0x1248642 will probably yield a bit pattern that looks like 0x8642. And assuming a two's-complement representation for signed types (which is used on almost all systems), the high-order bit is the sign bit, so the numeric value of the result will be -31166.

int y   =   sx;

This also involves an implicit conversion, from short to int. Since the range of int is guaranteed to cover at least the entire range of short, the value is unchanged. (Since, in your example, the value of sx happens to be negative, this change of representation is likely to involve sign extension, propagating the 1 sign bit to all 16 high-order bits of the result.)

As I indicated, none of these details are required by the language standard. If you really want to truncate values to a narrower type, it's probably best to use unsigned types (which have language-specified wraparound behavior) and perhaps explicit masking operations, like this:

unsigned int x = 0x1248642;
unsigned short sx = x & 0xFFFF;

If you have a 32-bit quantity that you want to shove into a 16-bit variable, the first thing you should do is decide how you want your code to behave if the value doesn't fit. Once you've decided that, you can figure out how to write C code that does what you want. Sometimes truncation happens to be what you want, in which case your task is going to be easy, especially if you're using unsigned types. Sometimes an out-of-range value is an error, in which case you need to check for it and decide how to handle the error. Sometimes you might want the value to saturate, rather than truncate, so you'll need to write code to do that.

Knowing how conversions work in C is important, but if you start with that question you just might be approaching your problem from the wrong direction.



回答2:

The 32 bit value is truncated to 16 bits in the same way a 32cm long banana bread would be cut if you jam it into a 16cm long pan. Half of it would fit in and still be a banana bread, and the rest will be "gone".



回答3:

Truncation happens in CPU registers. These have different sizes: 8/16/32/64 bits. Now, you can imagine a register like:

<--rax----------------------------------------------------------------> (64-bit)
                                    <--eax----------------------------> (32-bit)
                                                      <--ax-----------> (16-bit)
                                                      <--ah--> <--al--> (8-bit high & low)
01100011 01100001 01110010 01110010 01111001 00100000 01101111 01101110

x is first given the 32 bit value 0x1248642. In memory*, it'll look like:

-----------------------------
|  01  |  24  |  86  |  42  |
-----------------------------
 31..24 23..16 15..8  7..0       

Now, the compiler loads x in a register. From it, it can simply load the least significant 16 bits (namely, ax) and store them into sx.


*Endianness is not taken into account for the sake of simplicity



回答4:

Simply the high 16 bits are cut off from the integer. Therefore your short will become 0x8642 which is actually negative number -31166.



回答5:

Perhaps let the code speak for itself:

#include <stdio.h>

#define BYTETOBINARYPATTERN "%d%d%d%d%d%d%d%d"
#define BYTETOBINARY(byte)  \
   ((byte) & 0x80 ? 1 : 0), \
   ((byte) & 0x40 ? 1 : 0), \
   ((byte) & 0x20 ? 1 : 0), \
   ((byte) & 0x10 ? 1 : 0), \
   ((byte) & 0x08 ? 1 : 0), \
   ((byte) & 0x04 ? 1 : 0), \
   ((byte) & 0x02 ? 1 : 0), \
   ((byte) & 0x01 ? 1 : 0) 

int main()
{
    int x    =   0x1248642;
    short sx = (short) x;
    int y    =   sx;

    printf("%d\n", x);
    printf("%hu\n", sx);
    printf("%d\n", y);

    printf("x: "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN"\n",
        BYTETOBINARY(x>>24), BYTETOBINARY(x>>16), BYTETOBINARY(x>>8), BYTETOBINARY(x));

    printf("sx: "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN"\n",
        BYTETOBINARY(y>>8), BYTETOBINARY(y));

    printf("y: "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN" "BYTETOBINARYPATTERN"\n",
        BYTETOBINARY(y>>24), BYTETOBINARY(y>>16), BYTETOBINARY(y>>8), BYTETOBINARY(y));

    return 0;
}

Output:

19170882
34370
-31166

x: 00000001 00100100 10000110 01000010
sx: 10000110 01000010
y: 11111111 11111111 10000110 01000010

As you can see, int -> short yields the lower 16 bits, as expected.

Casting short to int yields the short with the 16 high bits set. However, I suspect this is implementation specific and undefined behavior. You're essentially interpreting 16 bits of memory as an integer, which reads 16 extra bits of whatever rubbish happens to be there (or 1's if the compiler is nice and wants to help you find bugs quicker).

I think it should be safe to do the following:

int y = 0x0000FFFF & sx;

Obviously you won't get back the lost bits, but this will guarantee that the high bits are properly zeroed.

If anyone can verify the short -> int high bit behavior with an authoritative reference, that would be appreciated.

Note: Binary macro adapted from this answer.



回答6:

sx value will be the same as 2 least significant bytes of x, in this case it will be 0x8642 which (if interpreted as 16 bit signed integer) gives -31166 in decimal.