I need your help at understanding how bit fields work in C programming.
I have declared this struct:
struct message
{
unsigned char first_char : 6;
unsigned char second_char : 6;
unsigned char third_char : 6;
unsigned char fourth_char : 6;
unsigned char fifth_char : 6;
unsigned char sixth_char : 6;
unsigned char seventh_char : 6;
unsigned char eigth_char : 6;
}__packed message;
I saved the size of the struct into an integer using sizeof(message).
I thought the value of the size will be 6 since 6 * 8 = 48 bits, which is 6 bytes, but instead it has the size value of 8 bytes.
Can anyone explain to me why, and how exactly bit fields and their alignments work?
EDIT
i forgot to explain the situation where i use the struct.
lets say i receive packet of 6 bytes in this form:
void * packet
i then cast the data like this:
message * msg = (message *)packet;
now i want to print the value of each member, so although i declared the members as 6 bits, the members use 8 bits which cause to wrong result when printing. for example i receive the next data:
00001111 11110000 00110011 00001111 00111100 00011100
i thought the value of the members will be as shown below:
first_char = 000011
second = 111111
third = 000000
fourth = 110011
fifth = 000011
sixth = 110011
seventh = 110000
eigth = 011100
but that is not what hapening, i hope i explained it well, if not please tell me.
Bit-fields don't have to run across different underlying elements ("units"), so you're witnessing that each of your fields occupies an entire unsigned char. The behaviour is implemention-defined, thoug; cf. C11 6.7.2.1/11:
Additionally, no bit-field may be larger than what would fit into one single unit, by the constraint in 6.7.2.1/4:
Almost everything about bit-fields is implementation defined. In particular, how bit-fields are packed together is implementation defined. An implementation need not let bit-fields cross the boundaries of addressable storage units, and it appears that yours does not.
And that is by no means the end of the 'implementation-defined' features of bit-fields.
[Please choose the answer by Kerek SB rather than this one as it has the crucial information about §6.7.2.1 ¶4 as well.]
Example code
Sample compilations and runs
Testing on Mac OS X 10.9.2 Mavericks with GCC 4.9.0 (64-bit build;
sizeof(int) == 4
andsizeof(long_ == 8
). Source code is inbf.c
; the program created isbf
.Note that there are 4 different sets of results for the 4 different type sizes. Note, too, that a compiler is not required to allow these types. The standard says (§6.7.2.1 again):
Another sub-question
I'm not sure I know all that much about bit-fields. I've never used them except in answers to questions on Stack Overflow. They're of no use when writing portable software, and I aim to write portable software — or, at least, software that is not gratuitously non-portable.
I imagine that you assumed a layout of the bits roughly equivalent to this:
It is supposed to represent 48 bits in 8 groups of 6 bits, laid out contiguously with no spaces or padding.
Now, one reason why that can't happen is the rule from §6.7.2.1 ¶4 that when you use a type
T
for a bit-field, then the width of the bit-field cannot be larger thanCHAR_BIT * sizeof(T)
. In your code,T
wasunsigned char
, so bit-fields cannot be larger than 8 bits or else they cross storage unit boundaries. Of course, yours are only 6 bits, but it means that you can't fit a second bit-field into the storage unit. IfT
isunsigned short
, then only two 6-bit fields fit into a 16-bit storage unit; ifT
is a 32-bitint
, then five 6-bit fields can fit; ifT
is a 64-bitunsigned long
, then 10 6-bit fields can fit.Another reason is that access to such bit-fields that cross byte boundaries would be moderately inefficient. For example, given (
Message
as defined in my example code):Suppose that the code treated the values as being stored in a packed byte array with fields overlapping byte boundaries. Then the code needs to set the bits marked
y
below:This is a pattern of bits. The
x
bits might correspond tofirst_char
; thez
bits might correspond to part ofthird_char
; and they
bits to the old value ofsecond_char
. So, the assignment has to copy the first 6 bits of Byte 0 and assign 2 bits of the new value to the last two bits:If it is treated as a 16-bit unit, then the code would be equivalent to:
The 32-bit or 64-bit assignments are somewhat similar to the 16-bit version:
This makes a particular set of assumptions about the way the bits are laid out inside the bit-field. Different assumptions come up with slightly different expressions, but something analogous to this is needed if the bit-field is treated as a contiguous array of bits.
By comparison, with the 6-bits per byte layout actually used, the assignment becomes much simpler:
and it would be legitimate for the compiler to omit the mask operation shown as the values in the padding bits is indeterminate (but the value would have to be an 8-bit assignment).
The amount of code needed to access a bit-field is one reason why most people avoid them. The fact that different compilers can make different layout assumptions for the same definition means that values cannot be reliably passed between machines of different types. Usually, an ABI will define the details that Standard C does not, but if one machine is a PowerPC or SPARC and the other is based on Intel, then all bets are off. It becomes better to do the shifting and masking yourself; at least the cost of the computation is visible.