How do one use `offsetof` to access a field in a s

2020-08-21 02:44发布

问题:

Let's suppose I have a struct and extract the offset to a member:

struct A {
    int x;
};

size_t xoff = offsetof(A, x);

how can I, given a pointer to struct A extract the member in a standard conforming way? Assuming of course that we have a correct struct A* and a correct offset. One attempt would be to do something like:

int getint(struct A* base, size_t off) {
    return *(int*)((char*)base + off); 
}

Which probably will work, but note for example that pointer arithmetics only seem to be defined in the standard if the pointers are pointers of the same array (or one past the end), this need not be the case. So technically that construct would seem to rely on undefined behaviour.

Another approach would be

int getint(struct A* base, size_t off) {
    return *(int*)((uintptr_t)base + off);
}

which also probably would work, but note that intptr_t is not required to exist and as far as I know arithmetics on intptr_t doesn't need to yield the correct result (for example I recall some CPU has the capability to handle non-byte aligned addresses which would suggest that intptr_t increases in steps of 8 for each char in an array).

It looks like there's something forgotten in the standard (or something I've missed).

回答1:

Per the C Standard, 7.19 Common definitions <stddef.h>, paragraph 3, offsetof() is defined as:

The macros are

NULL

which expands to an implementation-defined null pointer constant; and

offsetof(*type*, *member-designator*)

which expands to an integer constant expression that has type size_t, the value of which is the offset in bytes, to the structure member (designated by member-designator), from the beginning of its structure (designated by type).

So, offsetoff() returns an offset in bytes.

And 6.2.6.1 General, paragraph 4 states:

Values stored in non-bit-field objects of any other object type consist of n × CHAR_BIT bits, where n is the size of an object of that type, in bytes.

Since CHAR_BIT is defined as the number of bits in a char, a char is one byte.

So, this is correct, per the standard:

int getint(struct A* base, size_t off) {
    return *(int*)((char*)base + off); 
}

That converts base to a char * and adds off bytes to the address. If off is the result of offsetof(A, x);, the resulting address is the address of x within the structure A that base points to.

Your second example:

int getint(struct A* base, size_t off) {
    return *(int*)((intptr_t)base + off);
}

is dependent upon the result of the addition of the signed intptr_t value with the unsigned size_t value being unsigned.



回答2:

The reason why the standard (6.5.6) only allows pointer arithmetic for arrays, is that structs may have padding bytes to sate alignment requirements. So doing pointer arithmetic inside a struct is indeed formally undefined behavior.

In practice, it will work as long as you know what you are doing. base + off cannot fail, because we know that there is valid data there and it is not misaligned, given that it is accessed properly.

Therefore (intptr_t)base + off is indeed much better code, as there is no longer any pointer arithmetic, but just plain integer arithmetic. Because intptr_t is an integer, it is not a pointer.

As pointed out in a comment, this type is not guaranteed to exist, it is optional as per 7.20.1.4/1. I suppose for maximum portability, you could switch to other types that are guaranteed to exist, such as intmax_t or ptrdiff_t. It is however arguable if a C99/C11 compiler without support for intptr_t is at all useful.

(There is a small type issue here, namely that intptr_t is a signed type, and not necessarily compatible with size_t. You might get implicit type promotion issues. It is safer to use uintptr_t if possible.)

The next question then is if *(int*)((intptr_t)base + off) is well-defined behavior. The part of the standard regarding pointer conversions (6.3.2.3) says that:

Any pointer type may be converted to an integer type. Except as previously specified, the result is implementation-defined. If the result cannot be represented in the integer type, the behavior is undefined. The result need not be in the range of values of any integer type.

For this specific case, we know that we have a correctly aligned int there, so it is fine.

(I don't believe that any pointer aliasing concerns apply either. At least compiling with gcc -O3 -fstrict-aliasing -Wstrict-aliasing=2 doesn't break the code.)