How do bit fields and their alignments work in C p

2019-06-13 06:07发布

I need your help at understanding how bit fields work in C programming.

I have declared this struct:

struct message
{
    unsigned char first_char : 6;
    unsigned char second_char : 6;
    unsigned char third_char : 6;
    unsigned char fourth_char : 6;
    unsigned char fifth_char : 6;
    unsigned char sixth_char : 6;
    unsigned char seventh_char : 6;
    unsigned char eigth_char : 6;
}__packed message;

I saved the size of the struct into an integer using sizeof(message).

I thought the value of the size will be 6 since 6 * 8 = 48 bits, which is 6 bytes, but instead it has the size value of 8 bytes.

Can anyone explain to me why, and how exactly bit fields and their alignments work?

EDIT

i forgot to explain the situation where i use the struct. lets say i receive packet of 6 bytes in this form: void * packet

i then cast the data like this:

message * msg = (message *)packet;

now i want to print the value of each member, so although i declared the members as 6 bits, the members use 8 bits which cause to wrong result when printing. for example i receive the next data:

00001111 11110000 00110011 00001111 00111100 00011100

i thought the value of the members will be as shown below:

first_char = 000011

second = 111111

third = 000000

fourth = 110011

fifth = 000011

sixth = 110011

seventh = 110000

eigth = 011100

but that is not what hapening, i hope i explained it well, if not please tell me.

2条回答
Root(大扎)
2楼-- · 2019-06-13 06:38

Bit-fields don't have to run across different underlying elements ("units"), so you're witnessing that each of your fields occupies an entire unsigned char. The behaviour is implemention-defined, thoug; cf. C11 6.7.2.1/11:

An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified.

Additionally, no bit-field may be larger than what would fit into one single unit, by the constraint in 6.7.2.1/4:

The expression that specifies the width of a bit-field shall be an integer constant expression with a nonnegative value that does not exceed the width of an object of the type that would be specified were the colon and expression omitted.

查看更多
欢心
3楼-- · 2019-06-13 06:39

Almost everything about bit-fields is implementation defined. In particular, how bit-fields are packed together is implementation defined. An implementation need not let bit-fields cross the boundaries of addressable storage units, and it appears that yours does not.

ISO/IEC 9899:2011 §6.7.2.1 Structure and union specifiers

¶11 An implementation may allocate any addressable storage unit large enough to hold a bit-field. If enough space remains, a bit-field that immediately follows another bit-field in a structure shall be packed into adjacent bits of the same unit. If insufficient space remains, whether a bit-field that does not fit is put into the next unit or overlaps adjacent units is implementation-defined. The order of allocation of bit-fields within a unit (high-order to low-order or low-order to high-order) is implementation-defined. The alignment of the addressable storage unit is unspecified

And that is by no means the end of the 'implementation-defined' features of bit-fields.

[Please choose the answer by Kerek SB rather than this one as it has the crucial information about §6.7.2.1 ¶4 as well.]


Example code

#include <stdio.h>

#if !defined(BITFIELD_BASE_TYPE)
#define BITFIELD_BASE_TYPE char
#endif

int main(void)
{
    typedef struct Message
    {
        unsigned BITFIELD_BASE_TYPE first_char   : 6;
        unsigned BITFIELD_BASE_TYPE second_char  : 6;
        unsigned BITFIELD_BASE_TYPE third_char   : 6;
        unsigned BITFIELD_BASE_TYPE fourth_char  : 6;
        unsigned BITFIELD_BASE_TYPE fifth_char   : 6;
        unsigned BITFIELD_BASE_TYPE sixth_char   : 6;
        unsigned BITFIELD_BASE_TYPE seventh_char : 6;
        unsigned BITFIELD_BASE_TYPE eighth_char  : 6;
    } Message;

    typedef union Bytes_Message
    {
        Message m;
        unsigned char b[sizeof(Message)];
    } Bytes_Message;

    Bytes_Message u;

    printf("Message size: %zu\n", sizeof(Message));

    u.m.first_char   = 0x3F;
    u.m.second_char  = 0x15;
    u.m.third_char   = 0x2A;
    u.m.fourth_char  = 0x11;
    u.m.fifth_char   = 0x00;
    u.m.sixth_char   = 0x23;
    u.m.seventh_char = 0x1C;
    u.m.eighth_char  = 0x3A;

    printf("Bit fields: %.2X %.2X %.2X %.2X %.2X %.2X %.2X %.2X\n",
           u.m.first_char,   u.m.second_char, u.m.third_char,
           u.m.fourth_char,  u.m.fifth_char,  u.m.sixth_char,
           u.m.seventh_char, u.m.eighth_char);

    printf("Bytes:     ");
    for (size_t i = 0; i < sizeof(Message); i++)
        printf(" %.2X", u.b[i]);
    putchar('\n');

    return 0;
}

Sample compilations and runs

Testing on Mac OS X 10.9.2 Mavericks with GCC 4.9.0 (64-bit build; sizeof(int) == 4 and sizeof(long_ == 8). Source code is in bf.c; the program created is bf.

$ gcc -DBITFIELD_BASE_TYPE=char -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      3F 15 2A 11 00 23 1C 3A
$ gcc -DBITFIELD_BASE_TYPE=short -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      7F 05 6A 04 C0 08 9C 0E
$ gcc -DBITFIELD_BASE_TYPE=int -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      7F A5 46 00 23 A7 03 00
$ gcc -DBITFIELD_BASE_TYPE=long -O3 -g -std=c11 -Wall -Wextra -Wmissing-prototypes -Wstrict-prototypes -Wold-style-definition -Werror bf.c -o bf
$ ./bf
Message size: 8
Bit fields: 3F 15 2A 11 00 23 1C 3A
Bytes:      7F A5 46 C0 C8 E9 00 00
$

Note that there are 4 different sets of results for the 4 different type sizes. Note, too, that a compiler is not required to allow these types. The standard says (§6.7.2.1 again):

¶4 The expression that specifies the width of a bit-field shall be an integer constant expression with a nonnegative value that does not exceed the width of an object of the type that would be specified were the colon and expression omitted.122) If the value is zero, the declaration shall have no declarator.

¶5 A bit-field shall have a type that is a qualified or unqualified version of _Bool, signed int, unsigned int, or some other implementation-defined type.

122) While the number of bits in a _Bool object is at least CHAR_BIT, the width (number of sign and value bits) of a _Bool may be just 1 bit.


Another sub-question

Can you explain to me why I was wrong with thinking I would get the size of 6? I asked a lot of my friends but they don't know much about bit-fields.

I'm not sure I know all that much about bit-fields. I've never used them except in answers to questions on Stack Overflow. They're of no use when writing portable software, and I aim to write portable software — or, at least, software that is not gratuitously non-portable.

I imagine that you assumed a layout of the bits roughly equivalent to this:

+------+------+------+------+------+------+------+------+
|  f1  |  f2  |  f3  |  f4  |  f5  |  f6  |  f7  |  f8  |
+------+------+------+------+------+------+------+------+

It is supposed to represent 48 bits in 8 groups of 6 bits, laid out contiguously with no spaces or padding.

Now, one reason why that can't happen is the rule from §6.7.2.1 ¶4 that when you use a type T for a bit-field, then the width of the bit-field cannot be larger than CHAR_BIT * sizeof(T). In your code, T was unsigned char, so bit-fields cannot be larger than 8 bits or else they cross storage unit boundaries. Of course, yours are only 6 bits, but it means that you can't fit a second bit-field into the storage unit. If T is unsigned short, then only two 6-bit fields fit into a 16-bit storage unit; if T is a 32-bit int, then five 6-bit fields can fit; if T is a 64-bit unsigned long, then 10 6-bit fields can fit.

Another reason is that access to such bit-fields that cross byte boundaries would be moderately inefficient. For example, given (Message as defined in my example code):

Message bf = …initialization code…

int nv = 0x2A;
bf.second_char = nv;

Suppose that the code treated the values as being stored in a packed byte array with fields overlapping byte boundaries. Then the code needs to set the bits marked y below:

             Byte 0             |            Byte 1
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+
| x | x | x | x | x | x | y | y | y | y | y | y | z | z | z | z |
+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+---+

This is a pattern of bits. The x bits might correspond to first_char; the z bits might correspond to part of third_char; and the y bits to the old value of second_char. So, the assignment has to copy the first 6 bits of Byte 0 and assign 2 bits of the new value to the last two bits:

((unsigned char *)&bf)[0] = (((unsigned char *)&bf)[0] & 0xFC) | ((nv >> 4) & 0x03);
((unsigned char *)&bf)[1] = (((unsigned char *)&bf)[1] & 0x0F) | ((nv << 4) & 0xF0);

If it is treated as a 16-bit unit, then the code would be equivalent to:

((unsigned short *)&bf)[0] = (((unsigned char *)&bf)[0] & 0xFC0F) | ((nv << 4) & 0x03F0);

The 32-bit or 64-bit assignments are somewhat similar to the 16-bit version:

((unsigned int  *)&bf)[0] = (((unsigned int  *)&bf)[0] & 0xFC0FFFFF) |
                                           ((nv << 20) & 0x03F00000);
((unsigned long *)&bf)[0] = (((unsigned long *)&bf)[0] & 0xFC0FFFFFFFFFFFFF) |
                                           ((nv << 52) & 0x03F0000000000000);

This makes a particular set of assumptions about the way the bits are laid out inside the bit-field. Different assumptions come up with slightly different expressions, but something analogous to this is needed if the bit-field is treated as a contiguous array of bits.

By comparison, with the 6-bits per byte layout actually used, the assignment becomes much simpler:

((unsigned char *)&bf)[1] = nv & 0x3F;

and it would be legitimate for the compiler to omit the mask operation shown as the values in the padding bits is indeterminate (but the value would have to be an 8-bit assignment).

The amount of code needed to access a bit-field is one reason why most people avoid them. The fact that different compilers can make different layout assumptions for the same definition means that values cannot be reliably passed between machines of different types. Usually, an ABI will define the details that Standard C does not, but if one machine is a PowerPC or SPARC and the other is based on Intel, then all bets are off. It becomes better to do the shifting and masking yourself; at least the cost of the computation is visible.

查看更多
登录 后发表回答