Let us suppose I would like to read/write a tar file header.
Considering standard C (C89, C99, or C11),
do char arrays have any special treatment in structs, regarding padding? Can the compiler add padding to such a struct:
struct header {
char name[100];
char mode[8];
char uid[8];
char gid[8];
char size[12];
char mtime[12];
char chksum[8];
char typeflag;
char linkname[100];
char tail[255];
};
I've seen it used in code on the web as well. Just freading, fwriting this struct to the file in one chunk, assuming there will not be any padding. Of course also assuming CHAR_BITS == 8
.
I'm thinking such C code is so common, the standard would deal with this case, but I just can't find it in it, maybe I would not be a good lawyer.
EDIT
The accepted answer would give a strict, or the strictest possible portable implementation according one of the C standards, that lets me treat these fields with standard library string functions. Considering CHAR_BITS
and all. I'm thinking one needs to read an array of 512 uint8_t
for this, and after that maybe convert them to chars, one by one. Any easier way?
C11 (the latest freely available draft) says only "There may be unnamed padding within a structure object, but not at its beginning" (§6.7.2.1 ¶15) and "There may be unnamed padding at the end of a structure or union" (§6.7.2.1 ¶17). It gives no further restriction on padding within a structure.
The platform ABI may have more stringent requirements on padding, but depending on this will be platform-specific, as other platforms may have other padding requirements. The x86-64 ABI for Unix/Linux gives char
1 byte alignment, and specifies:
Structures and unions assume the alignment of their most strictly aligned component. Each member is assigned to the lowest available offset with the appropriate
alignment. The size of any object is always a multiple of the object’s alignment.
An array uses the same alignment as its elements, except that a local or global
array variable of length at least 16 bytes or a C99 variable-length array variable
always has alignment of at least 16 bytes4
Structure and union objects can require padding to meet size and alignment
constraints. The contents of any padding is undefined.
4The alignment requirement allows the use of SSE instructions when operating on the array.
The compiler cannot in general calculate the size of a variable-length array (VLA), but it is ex-
pected that most VLAs will require at least 16 bytes, so it is logical to mandate that VLAs have at
least a 16-byte alignment.
This seems to imply that on this platform, there will be no padding within the struct. However, there are cases in which array variables have stricter alignment restriction in order to be able to be used with vector instructions; other platforms may impose such restrictions on array structure members as well.
If you would like to be portable, while reading the structure in a single call, you might want to look at readv
. This is a vectored or scatter/gather I/O operation, which allows you to specify an array of arrays and lengths to read into. For instance, for this case you might write:
struct header h;
struct iovec iov[10];
iov[0].iov_base = &h.name;
iov[0].iov_len = sizeof(h.name);
iov[1].iov_base = &h.mode;
iov[1].iov_len = sizeof(h.mode);
/* ... etc ... */
bytes_read = readv(fd, iov, 10);
Note that readv
is defined in POSIX/Single Unix Specification, not in the C standard. In standard C, the easiest thing to do is just read each of these elements individually (and even with vectored I/O available, just reading and writing each element individually will probably be more clear unless you absolutely need to use a single call for the whole I/O operation).
In your edit, you write:
The accepted answer would give a strict, or the strictest possible portable implementation according one of the C standards, that lets me treat these fields with standard library string functions. Considering CHAR_BITS
and all. I'm thinking one needs to read an array of 512 uint8_t
for this, and after that maybe convert them to chars, one by one. Any easier way?
The C specification does not guarantee that uint8_t
is available: "The typedef name uintN_t
designates an unsigned integer type with width N and no padding bits.... These types are optional." (C11 draft, §7.20.1.1, ¶2–3). However, if 8 bit values are available, then char
is guaranteed to be an 8 bit value, as it is guaranteed to be at least 8 bits and is guaranteed to be the smallest object that is not a bit-field (§5.2.4.2.1 ¶1):
The values given below shall be replaced by constant expressions suitable for use in #if
preprocessing directives. Moreover, except for CHAR_BIT
and MB_LEN_MAX
, the following shall be replaced by expressions that have the same type as would an expression that is an object of the corresponding type converted according to the integer promotions. Their implementation-defined values shall be equal or greater in magnitude (absolute value) to those shown, with the same sign.
- — number of bits for smallest object that is not a bit-field (byte)
CHAR_BIT 8
So, if you don't have an 8-bit bytes available, you won't be able to read these fields in directly and access octets from them as individual array elements; you would have to manually split out individual bytes using bit shifting and masking. However, there are no modern architectures that I know of which lack 8 bit bytes (for general purpose computing, where file I/O is at all a concern; some DSPs might, but they probably won't have standard C file I/O).
If you do have an 8-bit bytes, then char
is guaranteed to be 8 bits, so there's not much benefit other than clarity for using uint8_t
vs char
. If you're really concerned, I would just ensure that you have a check somewhere in your build process that CHAR_BIT
is 8 and call it good.
Actually padding, name mangling and such is not governed by the C standard but the specific ABI: http://en.wikipedia.org/wiki/Application_binary_interface.
There are clear standards how to pad datatypes so that they can be shared between different compilers. Your man page will most likely tell you switches to change the ABI.
The draft C99 and C11 standard says in section 6.7.2.1
Structure and union specifiers in paragraph 13(paragraph 15 in C11):
[...]There may be unnamed padding within a structure object, but not at its beginning.
and in paragraph 15(paragraph 17 in C11):
There may be unnamed padding at the end of a structure or union.