Strange output when printing the value 0x89 (-119)

2020-05-09 09:16发布

问题:

As the title says, I get a "weird" result when running the following code:

#include <stdio.h>

int main()
{
    char buff[4] = {0x17, 0x89, 0x39, 0x40};
    unsigned int* ptr = (unsigned int*)buff;
    char a = (char)((*ptr << (0*8)) >> (3*8));
    char b = (char)((*ptr << (1*8)) >> (3*8));
    char c = (char)((*ptr << (2*8)) >> (3*8));
    char d = (char)((*ptr << (3*8)) >> (3*8));

    printf("0x%x\n", *ptr);
    printf("0x%x\n", a);
    printf("0x%x\n", b);
    printf("0x%x\n", c);
    printf("0x%x\n", d);

    return 0;
}

Output:

0x40398917
0x40
0x39
0xffffff89
0x17

Why am I not getting 0x89 ?

回答1:

It's because your char variables are signed and they're undergoing sign extension when being promoted (upgraded to a wider type in this case). Sign extension is a way of preserving the sign when doing this promotion, so that -119 stays as -119 whether it's 8-bit, 16-bit or a wider type.

You can fix it by explicitly using unsigned char since, in C at least, whether char is signed or unsigned is implementation-specific. From C11 6.2.5 Types /15:

The implementation shall define char to have the same range, representation, and behavior as either signed char or unsigned char.

Sign extension does not come into play for unsigned types because they're, ... well, unsigned :-)



回答2:

char, by default, is signed - this means that numbers run from -128 to 127. Any number outside of that doesn't fit. If you changed char to unsigned char, you will get the numbers you expect.



回答3:

Use memcpy not a cast

char buff[4] = {0x17, 0x89, 0x39, 0x40};
unsigned int* ptr = (unsigned int*)buff;

This is not correct: buff does not point to an int object or array, so the cast (unsigned int*)buff is not defined.

The safe way to reinterpret buff as an unsigned int is with memcpy:

char buff[4] = {0x17, 0x89, 0x39, 0x40};
unsigned int ui;
assert (sizeof ui == sizeof buff);
memcpy (buff, &ui, sizeof ui);

When using memcpy, you have no make sure the bit representation you copy is valid for the destination type, of course.

One portable but degenerate way to do that is to check that the representation matches an existing object (beware, the following is silly code):

char *null_ptr = 0;
char null_bytes[sizeof null_ptr] = {0};
if (memcmp (null_ptr, null_bytes, sizeof null_bytes)==0) {
    char *ptr2;
    memcpy (null_bytes, ptr2, sizeof null_bytes);
    assert (ptr2 == 0);
}

This code uses memcpy and has fully defined behavior (even if useless). OTOH, the behavior of

int *ptr3 = (int*)null_bytes;

is not defined, because null_bytes is not the address of an int or unsigned int.



标签: c++ c hex output