Are there any guarantees about C struct order?

2020-04-12 08:32发布

站内文章 / C

45 0

叛逆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I've used structs extensively and I've seen some interesting things, especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?

Also note that sizes aren't guaranteed because of alignment, whats the alginment value based on, the architecture/register size?

We align data/code for faster execution can we tell compiler not to do this? so maybe we can guarantee certain things about structs, like their size?

When doing pointer arithmetic on struct members in order to locate member offset, I take it you do - if little endian + for big endian, or does it just depend on the compiler?

what does malloc(0) really allocate?

The following code is for educational/discovery purposes, its not meant to be of production quality.

#include <stdlib.h>
#include <stdio.h>

int main()
{
    printf("sizeof(struct {}) == %lu;\n", sizeof(struct {}));
    printf("sizeof(struct {int a}) == %lu;\n", sizeof(struct {int a;}));
    printf("sizeof(struct {int a; double b;}) == %lu;\n", sizeof(struct {int a; double b;}));
    printf("sizeof(struct {char c; double a; double b;}) == %lu;\n", sizeof(struct {char c; double a; double b;}));

    printf("malloc(0)) returns %p\n", malloc(0));
    printf("malloc(sizeof(struct {})) returns %p\n", malloc(sizeof(struct {})));

    struct {int a; double b;} *test = malloc(sizeof(struct {int a; double b;}));
    test->a = 10;
    test->b = 12.2;
    printf("test->a == %i, *test == %i \n", test->a, *(int *)test);
    printf("test->b == %f, offset of b is %i, *(test - offset_of_b) == %f\n",
        test->b, (int)((void *)test - (void *)&test->b),
        *(double *)((void *)test - ((void *)test - (void *)&test->b))); // find the offset of b, add it to the base,$

    free(test);
    return 0;
}

calling gcc test.c followed by ./a.out I get this:

sizeof(struct {}) == 0;
sizeof(struct {int a}) == 4;
sizeof(struct {int a; double b;}) == 16;
sizeof(struct {char c; double a; double b;}) == 24;
malloc(0)) returns 0x100100080
malloc(sizeof(struct {})) returns 0x100100090
test->a == 10, *test == 10 
test->b == 12.200000, offset of b is -8, *(test - offset_of_b) == 12.200000

Update this is my machine:

gcc --version

i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5666) (dot 3)
Copyright (C) 2007 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

uname -a

Darwin MacBookPro 10.8.0 Darwin Kernel Version 10.8.0: Tue Jun  7 16:33:36 PDT 2011; root:xnu-1504.15.3~1/RELEASE_I386 i386

回答1:

I've used structs extensively and I've seen some interesting things, especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?

Yes, *value is safe; it yields a copy of the structure that value points at. But it is almost guaranteed to have a different type from *value->first_value, so the result of *value will almost always be different from *value->first_value.

Counter-example:

struct something { struct something *first_value; ... };
struct something data = { ... };
struct something *value = &data;
value->first_value = value;

Under this rather limited set of circumstances, you would get the same result from *value and *value->first_value. Under that scheme, the types would be the same (even if the values are not). In the general case, the type of *value and *value->first_value are of different types.

Also note that sizes aren't guaranteed because of alignment, but is alignment always on register size?

Since 'register size' is not a defined C concept, it isn't clear what you're asking. In the absence of pragmas (#pragma pack or similar), the elements of a structure will be aligned for optimal performance when the value is read (or written).

We align data/code for faster execution; can we tell compiler not to do this? So maybe we can guarantee certain things about structs, like their size?

The compiler is in charge of the size and layout of struct types. You can influence by careful design and perhaps by #pragma pack or similar directives.

These questions normally arise when people are concerned about serializing data (or, rather, trying to avoid having to serialize data by processing structure elements one at a time). Generally, I think you're better off writing a function to do the serialization, building it up from component pieces.

When doing pointer arithmetic on struct members in order to locate member offset, I take it you do subtraction if little endian, addition for big endian, or does it just depend on the compiler?

You're probably best off not doing pointer arithmetic on struct members. If you must, use the offsetof() macro from <stddef.h> to handle the offsets correctly (and that means you're not doing pointer arithmetic directly). The first structure element is always at the lowest address, regardless of big-endianness or little-endianness. Indeed, endianness has no bearing on the layout of different members within a structure; it only has an affect on the byte order of values within a (basic data type) member of a structure.

The C standard requires that the elements of a structure are laid out in the order that they are defined; the first element is at the lowest address, and the next at a higher address, and so on for each element. The compiler is not allowed to change the order. There can be no padding before the first element of the structure. There can be padding after any element of the structure as the compiler sees fit to ensure what it considers appropriate alignment. The size of a structure is such that you can allocate (N × size) bytes that are appropriately aligned (e.g. via malloc()) and treat the result as an array of the structure.

回答2:

From 6.2.5/20:

A structure type describes a sequentially allocated nonempty set of member objects (and, in certain circumstances, an incomplete array), each of which has an optionally specified name and possibly distinct type.

To answer:

especially *value instead of value->first_value where value is a pointer to struct, first_value is the very first member, is *value safe?

see 6.7.2.1/15:

15 Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared. A pointer to a structure object, suitably converted, points to its initial member (or if that member is a bit-field, then to the unit in which it resides), and vice versa. There may be unnamed padding within a structure object, but not at its beginning.¹

There may however be padding bytes at the end of the structure as also in-between members.

In C, malloc( 0 ) is implementation defined. (As a side note, this is one of those little things where C and C++ differ.)

[1] Emphasis mine.

回答3:

Calling malloc(0) will return a pointer that may be safely passed to free() at least once. If the same value is returned by multiple malloc(0) calls, it may be freed once for each such call. Obviously, if it returns NULL, that could be passed to free() an unlimited number of times without effect. Every call to malloc(0) which returns non-null should be balanced by a call to free() with the returned value.

回答4:

If you have an inner structure it is guaranteed to start on the same address as the enclosing one if that is the first declaration of the enclosing structure.

So *value and value->first is accessing memory at the same address (but using different types) in the following

struct St {
  long first;
} *value;

Also, the ordering between memebers of the structure is guaranteed to be the same as the declaration order

To adjust alignment, you can use compiler specific directives or use bitfields.

The alignment of structure memebers are usually based on what's best to access the individual members on the target platform

Also, for malloc, it is possible it keeps some bookkeeping near the returned address, so even for zero-size memory it can return a valid address (just don't try to access anything via the returned address)

回答5:

It is important to learn about the way that size of struct works. for example:

struct foo{
  int i;
  char c;
}

struct bar{
  int i;
  int j;
}

struct baz{
  int i;
  char c;
  int j;
}

sizeof(foo) = 8 bytes (32 bit arch)
sizeof(bar) = 8 bytes
sizeof(baz) = 12 bytes

What this means is that struct sizes and offsets have to follow two rules:

1- The struct must be a multiple of it's first element (Why foo is 8 not 5 bytes)

2- A struct element must start on a multiple of itself. (In baz, int j could not start on 6, so bytes 6, 7, and 8 are wasted padding