Alignment of char array struct members in C standa

2019-05-10 23:48发布

问题:

Let us suppose I would like to read/write a tar file header. Considering standard C (C89, C99, or C11), do char arrays have any special treatment in structs, regarding padding? Can the compiler add padding to such a struct:

struct header {
    char name[100];
    char mode[8];
    char uid[8];
    char gid[8];
    char size[12];
    char mtime[12];
    char chksum[8];
    char typeflag;
    char linkname[100];
    char tail[255];
};

I've seen it used in code on the web as well. Just freading, fwriting this struct to the file in one chunk, assuming there will not be any padding. Of course also assuming CHAR_BITS == 8. I'm thinking such C code is so common, the standard would deal with this case, but I just can't find it in it, maybe I would not be a good lawyer.

EDIT

The accepted answer would give a strict, or the strictest possible portable implementation according one of the C standards, that lets me treat these fields with standard library string functions. Considering CHAR_BITS and all. I'm thinking one needs to read an array of 512 uint8_t for this, and after that maybe convert them to chars, one by one. Any easier way?

回答1:

C11 (the latest freely available draft) says only "There may be unnamed padding within a structure object, but not at its beginning" (§6.7.2.1 ¶15) and "There may be unnamed padding at the end of a structure or union" (§6.7.2.1 ¶17). It gives no further restriction on padding within a structure.

The platform ABI may have more stringent requirements on padding, but depending on this will be platform-specific, as other platforms may have other padding requirements. The x86-64 ABI for Unix/Linux gives char 1 byte alignment, and specifies:

Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate alignment. The size of any object is always a multiple of the object’s alignment.

An array uses the same alignment as its elements, except that a local or global array variable of length at least 16 bytes or a C99 variable-length array variable always has alignment of at least 16 bytes4

Structure and union objects can require padding to meet size and alignment constraints. The contents of any padding is undefined.


4The alignment requirement allows the use of SSE instructions when operating on the array. The compiler cannot in general calculate the size of a variable-length array (VLA), but it is ex- pected that most VLAs will require at least 16 bytes, so it is logical to mandate that VLAs have at least a 16-byte alignment.

This seems to imply that on this platform, there will be no padding within the struct. However, there are cases in which array variables have stricter alignment restriction in order to be able to be used with vector instructions; other platforms may impose such restrictions on array structure members as well.

If you would like to be portable, while reading the structure in a single call, you might want to look at readv. This is a vectored or scatter/gather I/O operation, which allows you to specify an array of arrays and lengths to read into. For instance, for this case you might write:

struct header h;
struct iovec iov[10];
iov[0].iov_base = &h.name;
iov[0].iov_len = sizeof(h.name);
iov[1].iov_base = &h.mode;
iov[1].iov_len = sizeof(h.mode);
/* ... etc ... */
bytes_read = readv(fd, iov, 10);

Note that readv is defined in POSIX/Single Unix Specification, not in the C standard. In standard C, the easiest thing to do is just read each of these elements individually (and even with vectored I/O available, just reading and writing each element individually will probably be more clear unless you absolutely need to use a single call for the whole I/O operation).

In your edit, you write:

The accepted answer would give a strict, or the strictest possible portable implementation according one of the C standards, that lets me treat these fields with standard library string functions. Considering CHAR_BITS and all. I'm thinking one needs to read an array of 512 uint8_t for this, and after that maybe convert them to chars, one by one. Any easier way?

The C specification does not guarantee that uint8_t is available: "The typedef name uintN_t designates an unsigned integer type with width N and no padding bits.... These types are optional." (C11 draft, §7.20.1.1, ¶2–3). However, if 8 bit values are available, then char is guaranteed to be an 8 bit value, as it is guaranteed to be at least 8 bits and is guaranteed to be the smallest object that is not a bit-field (§5.2.4.2.1 ¶1):

The values given below shall be replaced by constant expressions suitable for use in #if preprocessing directives. Moreover, except for CHAR_BIT and MB_LEN_MAX, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.

— number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT                              8

So, if you don't have an 8-bit bytes available, you won't be able to read these fields in directly and access octets from them as individual array elements; you would have to manually split out individual bytes using bit shifting and masking. However, there are no modern architectures that I know of which lack 8 bit bytes (for general purpose computing, where file I/O is at all a concern; some DSPs might, but they probably won't have standard C file I/O).

If you do have an 8-bit bytes, then char is guaranteed to be 8 bits, so there's not much benefit other than clarity for using uint8_t vs char. If you're really concerned, I would just ensure that you have a check somewhere in your build process that CHAR_BIT is 8 and call it good.



回答2:

Actually padding, name mangling and such is not governed by the C standard but the specific ABI: http://en.wikipedia.org/wiki/Application_binary_interface.

There are clear standards how to pad datatypes so that they can be shared between different compilers. Your man page will most likely tell you switches to change the ABI.



回答3:

The draft C99 and C11 standard says in section 6.7.2.1 Structure and union specifiers in paragraph 13(paragraph 15 in C11):

[...]There may be unnamed padding within a structure object, but not at its beginning.

and in paragraph 15(paragraph 17 in C11):

There may be unnamed padding at the end of a structure or union.