Let's suppose I have a struct and extract the offset to a member:
struct A {
int x;
};
size_t xoff = offsetof(A, x);
how can I, given a pointer to struct A
extract the member in a standard conforming way? Assuming of course that we have a correct struct A*
and a correct offset. One attempt would be to do something like:
int getint(struct A* base, size_t off) {
return *(int*)((char*)base + off);
}
Which probably will work, but note for example that pointer arithmetics only seem to be defined in the standard if the pointers are pointers of the same array (or one past the end), this need not be the case. So technically that construct would seem to rely on undefined behaviour.
Another approach would be
int getint(struct A* base, size_t off) {
return *(int*)((uintptr_t)base + off);
}
which also probably would work, but note that intptr_t
is not required to exist and as far as I know arithmetics on intptr_t
doesn't need to yield the correct result (for example I recall some CPU has the capability to handle non-byte aligned addresses which would suggest that intptr_t
increases in steps of 8 for each char
in an array).
It looks like there's something forgotten in the standard (or something I've missed).
Per the C Standard, 7.19 Common definitions <stddef.h>
, paragraph 3, offsetof()
is defined as:
The macros are
NULL
which expands to an implementation-defined null pointer constant; and
offsetof(*type*, *member-designator*)
which expands to an integer constant expression that has type
size_t
, the value of which is the offset in bytes, to the
structure member (designated by member-designator), from the
beginning of its structure (designated by type).
So, offsetoff()
returns an offset in bytes.
And 6.2.6.1 General, paragraph 4 states:
Values stored in non-bit-field objects of any other object type
consist of
n × CHAR_BIT bits, where n is the size of an object of that type, in bytes.
Since CHAR_BIT is defined as the number of bits in a char
, a char
is one byte.
So, this is correct, per the standard:
int getint(struct A* base, size_t off) {
return *(int*)((char*)base + off);
}
That converts base
to a char *
and adds off
bytes to the address. If off
is the result of offsetof(A, x);
, the resulting address is the address of x
within the structure A
that base
points to.
Your second example:
int getint(struct A* base, size_t off) {
return *(int*)((intptr_t)base + off);
}
is dependent upon the result of the addition of the signed intptr_t
value with the unsigned size_t
value being unsigned.
The reason why the standard (6.5.6) only allows pointer arithmetic for arrays, is that structs may have padding bytes to sate alignment requirements. So doing pointer arithmetic inside a struct is indeed formally undefined behavior.
In practice, it will work as long as you know what you are doing. base + off
cannot fail, because we know that there is valid data there and it is not misaligned, given that it is accessed properly.
Therefore (intptr_t)base + off
is indeed much better code, as there is no longer any pointer arithmetic, but just plain integer arithmetic. Because intptr_t
is an integer, it is not a pointer.
As pointed out in a comment, this type is not guaranteed to exist, it is optional as per 7.20.1.4/1. I suppose for maximum portability, you could switch to other types that are guaranteed to exist, such as intmax_t
or ptrdiff_t
. It is however arguable if a C99/C11 compiler without support for intptr_t
is at all useful.
(There is a small type issue here, namely that intptr_t
is a signed type, and not necessarily compatible with size_t
. You might get implicit type promotion issues. It is safer to use uintptr_t
if possible.)
The next question then is if *(int*)((intptr_t)base + off)
is well-defined behavior. The part of the standard regarding pointer conversions (6.3.2.3) says that:
Any pointer type may be converted to an integer type. Except as
previously specified, the result is implementation-defined. If the
result cannot be represented in the integer type, the behavior is
undefined. The result need not be in the range of values of any
integer type.
For this specific case, we know that we have a correctly aligned int
there, so it is fine.
(I don't believe that any pointer aliasing concerns apply either. At least compiling with gcc -O3 -fstrict-aliasing -Wstrict-aliasing=2
doesn't break the code.)