I have a question regarding structs in C. So when you create a struct, you are essentially defining the framework of a block of memory. Thus when you create an instance of a struct, you are creating a block of memory such that it is capable of holding a certain number of elements.
However, I'm somewhat confused on what the dot operator is doing. If I have a struct Car
and have a member called GasMileage
(which is an int
member), I am able to get the value of GasMileage
by doing something like,
int x = CarInstance.GasMileage;
However, I'm confused as to what is actually happening with this dot operator. Does the dot operator simply act as an offset from the base address? And how exactly is it able to deduce that it is an int?
I guess I'm curious as to what is going on behind the scenes. Would it be possible to reference GasMileage
by doing something else? Such as
int *GasMileagePointer = (&carInstance + offsetInBytes(GasMileage));
int x = *GasMileage
This is just something i quickly made up. I've tried hard searching for an good explanation, but nothing seems to explain it any further than treating the dot operator as magic.
When it works, the "." behavior of the "." operator is equivalent to taking the address of the structure, indexing it by the offset of the member, and converting that to a pointer of the member type, and dereferencing it. The Standard, however, provides that there are situations where that isn't guaranteed to work. For example, given:
a compiler may decide that there's no legitimate way that p1->x and p2->x can identify the same object, so it may reorder the code so as to the ++ and -- operations on s1->x cancel, and the ^=1 operations on s2->x cancel, thus leaving a function that does nothing.
Note that the behavior is different when using unions, since given:
the common-initial-subsequence rule indicates that since u->v1 and u->v2 start with fields of the same types, an access to such a field in u->v1 is equivalent to an access to the corresponding field in u->v2. Thus, a compiler is not allowed to resequence things. On the other hand, given
the fact that u.v1 and u.v2 start with matching fields doesn't guard against a compiler's assumption that the pointers won't alias.
Note that some compilers offer an option to force generation of code where member accesses always behave equivalent to the aforementioned pointer operations. For gcc, the option is
-fno-strict-alias
. If code will need to access common initial members of varying structure types, omitting that switch may cause one's code to fail in weird, bizarre, and unpredictable ways.The dot operator simply selects the member.
Since the compiler has information about the type (and consequently size) of the member (all members, actually), it knows the offset of the member from the start of the struct and can generate appropriate instructions. It may generate a base+offset access, but it also may access the member directly (or even have it cached in a register). The compiler has all those options since it has all the necessary information at compile time.
If it hasn't, like for incomplete types, you'll get a compile-time error.
Yes, the dot operator simply applies an offset from the base of the structure, and then accesses the value at that address.
is equivalent to:
For a member with some other type
T
, the only difference is that the cast(int *)
becomes(T *)
.When you use the
.
operator, the compiler translates this to an offset inside thestruct
, based on the size of the fields (and padding) that precede it.For example:
Assuming an
int
is 4 bytes and no padding, the offset ofmodel
is0
, the offset ofdoors
is52
, and the offset ofGasMilage
is 56.So if you know the offset of the member, you could get a pointer to it like this:
The cast to
char *
is necessary so that pointer arithmetic goes 1 byte at a time instead of 1sizeof(carInstance)
at a time. Then the result needs to be casted to the correct pointer type, in this caseint *