using C Pointer with char array

2020-07-22 19:39发布

站内文章 / C

60 0

贼婆χ

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

int i=512;
char *c = (char *)&i;
c[0] =1;
printf("%d",i);

this displays "513", it adds 1 to i.

int i=512;
char *c = (char *)&i;
c[1] =1;
printf("%d",i);

whereas this displays 256. Divides it by 2. Can someone please explain why? thanks a lot

回答1:

Binary

The 32-bit number 512 expressed in binary, is just:

00000000000000000000001000000000

because 2 to the power of 9 is 512. Conventionally, you read the bits from right-to-left.

Here are some other decimal numbers in binary:

The Cast: Reinterpreting the Int as an Array of Bytes

When you do this:

int i = 512;
char *c = (char *)&i;

you are interpreting the 4-byte integer as an array of characters (8-bit bytes), as you probably know. If not, here's what's going on:

&i

takes the address of the variable i.

(char *)&i

reinterprets it (or casts it) to a pointer to char type. This means it can now be used like an array. Since you know an int is at least 32-bit on your machine, can access its bytes using c[0], c[1], c[2], c[3].

Depending on the endianness of the system, the bytes of the number might be laid out: most significant byte first (big endian), or least significant byte first (little endian). x86 processors are little endian. This basically means the number 512 is laid out as in the example above, i.e.:

00000000 00000000 00000010 00000000
    c[3]     c[2]     c[1]     c[0]

I've grouped the bits into separate 8-bit chunks (bytes) corresponding to the way they are laid out in memory. Note, you also read them right-to-left here, so we can keep with conventions for the binary number system.

Consequences

Now setting c[0] = 1 has this effect:

00000000 00000000 00000010 00000001
    c[3]     c[2]     c[1]     c[0]

which is 2^9 + 2^0 == 513 in decimal.

Setting c[1] = 1 has this effect:

00000000 00000000 00000001 00000000
    c[3]     c[2]     c[1]     c[0]

which is 2^8 == 256 in decimal, because you've overwritten the second byte 00000010 with 00000001

Do note on a big endian system, the bytes would be stored in reverse order to a little endian system. This would mean you'd get totally different results to ones you got if you ran it on one of those machines.

回答2:

Remember char is 8 bit, 512 is bit representation is
512 = 10 0000 0000

when you do char *c = (char *)&i; you make:

c[1] = 10 c[0] = 0000 0000

when you do c[0] = 1 you make it 10 0000 0001 which is 513.

when you do c[1] = 1, you make it 01 0000 0000 which is 256.

回答3:

Before you wonder why what you're seeing is "odd", consider the platform you're running your code on, and the endianness therein.

Then consider the following

int main(int argc, char *argv[])
{
    int i=512;
    printf("%d : ", i);
    unsigned char *p = (unsigned char*)&i;
    for (size_t j=0;j<sizeof(i);j++)
        printf("%02X", p[j]);
    printf("\n");

    char *c = (char *)&i;
    c[0] =1;
    printf("%d : ", i);
    for (size_t j=0;j<sizeof(i);j++)
        printf("%02X", p[j]);
    printf("\n");

    i = 512;
    c[1] =1;
    printf("%d : ", i);
    for (size_t j=0;j<sizeof(i);j++)
        printf("%02X", p[j]);
    printf("\n");
    return 0;
}

On my platform (Macbook Air, OS X 10.8, Intel x64 Arch)

512 : 00020000
513 : 01020000
256 : 00010000

Couple what you see above with what you have hopefully read about endianness, and you can clearly see my platform is little endian. So whats yours?

回答4:

Since you are aliasing an int through a char pointer, and a char is 8 bits wide (a byte), the assignment:

c[1] = 1;

will set the second byte of i to 000000001. Bytes 1, 3 and 4 (if sizeof(int) == 4) will stay unmodified. Previously, that second byte was 000000010 (since I assume you're on an x86-based computer, which is a little-endian architecture.) So basically, you shifted the only bit that was set one position to the right. That's a division by 2.

On a little-endian machine and a compiler with 32-bit int, you originally had these four bytes in i:

  c[0]      c[1]      c[2]     c[3]
00000000  00000010  00000000 00000000

After the assignment, i was set to:

  c[0]      c[1]      c[2]     c[3]
00000000  00000001  00000000 00000000

and therefore it went from 512 to 256.

Now you should understand why c[0] = 1 results in 513 :-) Think about which byte is set to 1 and that the assignment doesn't change the other bytes at all.

回答5:

It's because your machine is little endian, meaning the least-significant byte is stored first in memory.

You said int i=512;. 512 is 0x00000200 in hex (assuming a 32-bit OS for simplicity). Let's look at how i would be stored in memory as hexadecimal bytes:

  00 02 00 00  // 4 bytes, least-significant byte first

Now we interpret that same memory location as a character array by doing char *c = (char *)&i; - same memory, different interpretation:

  00 02 00 00
c[0][1][2][3]

Now we change c[0] with c[0] =1; and the memory looks like

  01 02 00 00

Which means if we look at it as a little endian int again (by doing printf("%d",i);), it's hex 0x00000201, which is 513 decimal.

Now if we go back and change c[1] with c[1] =1;, your memory now becomes:

  00 01 00 00

Now we go back and interpret it as a little endian int, it's hex 0x00000100, which is 256 decimal.

回答6:

It's depends on the machine whether that is little endian or big endian that how data is stored in bits.for more read this about endianness

C language doesn't guarantee about this .

512 in binary :

    =============================================
    0000 0000 | 0000 0000 | 0000 0010 | 0000 0000   ==>512
    =============================================
      12          34          56          78

(0x12345678 suppose address of this int)

char *c =(char *)&i now c[0] either point to 0x78 or 0x12
Modifying the value using c[0] may result to 513 if it points to 0x78
    =============================================
    0000 0000 | 0000 0000 | 0000 0010 | 0000 0001   ==> 513
    =============================================

or, can be 

    =============================================
    0000 0001 | 0000 0000 | 0000 0010 | 0000 0000  ==>2^24+512
    =============================================

Similarly for 256 also : because your c1 will have the address of 2nd byte from right. in figure below,

    =============================================
    0000 0000 | 0000 0000 | 0000 0001 | 0000 0000  ==>256
    =============================================

So its implemention of representation of numbers in our system