After seeing this question a few minutes ago, I wondered why the language designers allow it as it allows indirect modification of private data. As an example
class TestClass {
private:
int cc;
public:
TestClass(int i) : cc(i) {};
};
TestClass cc(5);
int* pp = (int*)&cc;
*pp = 70; // private member has been modified
I tested the above code and indeed the private data has been modified. Is there any explanation of why this is allowed to happen or this just an oversight in the language? It seems to directly undermine the use of private data members.
Because, as Bjarne puts it, C++ is designed to protect against Murphy, not Machiavelli.
In other words, it's supposed to protect you from accidents -- but if you go to any work at all to subvert it (such as using a cast) it's not even going to attempt to stop you.
When I think of it, I have a somewhat different analogy in mind: it's like the lock on a bathroom door. It gives you a warning that you probably don't want to walk in there right now, but it's trivial to unlock the door from the outside if you decide to.
Edit: as to the question @Xeo discusses, about why the standard says "have the same access control" instead of "have all public access control", the answer is long and a little tortuous.
Let's step back to the beginning and consider a struct like:
C always had a few rules for a struct like this. One is that in an instance of the struct, the address of the struct itself has to equal the address of
a
, so you can cast a pointer to the struct to a pointer toint
, and accessa
with well defined results. Another is that the members have to be arranged in the same order in memory as they are defined in the struct (though the compiler is free to insert padding between them).For C++, there was an intent to maintain that, especially for existing C structs. At the same time, there was an apparent intent that if the compiler wanted to enforce
private
(andprotected
) at run-time, it should be easy to do that (reasonably efficiently).Therefore, given something like:
The compiler should be required to maintain the same rules as C with respect to
Y.a
andY.b
. At the same time, if it's going to enforce access at run time, it may want to move all the public variables together in memory, so the layout would be more like:Then, when it's enforcing things at run-time, it can basically do something like
if (offset > 3 * sizeof(int)) access_violation();
To my knowledge nobody's ever done this, and I'm not sure the rest of the standard really allows it, but there does seem to have been at least the half-formed germ of an idea along that line.
To enforce both of those, the C++98 said
Y::a
andY::b
had to be in that order in memory, andY::a
had to be at the beginning of the struct (i.e., C-like rules). But, because of the intervening access specifiers,Y::c
andY::e
no longer had to be in order relative to each other. In other words, all the consecutive variables defined without an access specifier between them were grouped together, the compiler was free to rearrange those groups (but still had to keep the first one at the beginning).That was fine until some jerk (i.e., me) pointed out that the way the rules were written had another little problem. If I wrote code like:
...you ended up with a little bit of self contradition. On one hand, this was still officially a POD struct, so the C-like rules were supposed to apply -- but since you had (admittedly meaningless) access specifiers between the members, it also gave the compiler permission to rearrange the members, thus breaking the C-like rules they intended.
To cure that, they re-worded the standard a little so it would talk about the members all having the same access, rather than about whether or not there was an access specifier between them. Yes, they could have just decreed that the rules would only apply to public members, but it would appear that nobody saw anything to be gained from that. Given that this was modifying an existing standard with lots of code that had been in use for quite a while, the opted for the smallest change they could make that would still cure the problem.
The compiler would have given you an error if you had tried
int *pp = &cc.cc
, the compiler would have told you that you cannot access a private member.In your code you are reinterpreting the address of cc as a pointer to an int. You wrote it the C style way, the C++ style way would have been
int* pp = reinterpret_cast<int*>(&cc);
. The reinterpret_cast always is a warning that you are doing a cast between two pointers that are not related. In such a case you must make sure that you are doing right. You must know the underlying memory (layout). The compiler does not prevent you from doing so, because this if often needed.When doing the cast you throw away all knowledge about the class. From now on the compiler only sees an int pointer. Of course you can access the memory the pointer points to. In your case, on your platform the compiler happened to put cc in the first n bytes of a TestClass object, so a TestClass pointer also points to the cc member.
A good reason is to allow compatibility with C but extra access safety on the C++ layer.
Consider:
By requiring that the C-visible
S
and the C++-visibleS
be layout-compatible,S
can be used across the language boundary with the C++ side having greater access safety. Thereinterpret_cast
access safety subversion is an unfortunate but necessary corollary.As an aside, the restriction on having all members with the same access control is because the implementation is permitted to rearrange members relative to members with different access control. Presumably some implementations put members with the same access control together, for the sake of tidiness; it could also be used to reduce padding, although I don't know of any compiler that does that.
Because of backwards-compatability with C, where you can do the same thing.
For all people wondering, here's why this is not UB and is actually allowed by the standard:
First,
TestClass
is a standard-layout class (§9 [class] p7
):And with that, you can are allowed to
reinterpret_cast
the class to the type of its first member (§9.2 [class.mem] p20
):In your case, the C-style
(int*)
cast resolves to areinterpret_cast
(§5.4 [expr.cast] p4
).The whole purpose of
reinterpret_cast
(and a C style cast is even more powerful than areinterpret_cast
) is to provide an escape path around safety measures.This is because you are manipulating the memory where your class is located in memory. In your case it just happen to store the private member at this memory location so you change it. It is not a very good idea to do because you do now know how the object will be stored in memory.