My company uses a messaging server which gets a message into a const char*
and then casts it to the message type.
I've become concerned about this after asking this question. I'm not aware of any bad behavior in the messaging server. Is it possible that const
variables do not incur aliasing problems?
For example say that foo is defined in MessageServer
in one of these ways:
- As a parameter:
void MessageServer(const char* foo)
- Or as const variable at the top of
MessageServer
:const char* foo = PopMessage();
Now MessageServer
is a huge function, but it never assigns anything to foo
, however at 1 point in MessageServer
's logic foo
will be cast to the selected message type.
auto bar = reinterpret_cast<const MessageJ*>(foo);
bar
will only be read from subsequently, but will be used extensively for object setup.
Is an aliasing problem possible here, or does the fact that foo
is only initialized, and never modified save me?
EDIT:
Jarod42's answer finds no problem with casting from a const char*
to a MessageJ*
, but I'm not sure this makes sense.
We know this is illegal:
MessageX* foo = new MessageX;
const auto bar = reinterpret_cast<MessageJ*>(foo);
Are we saying this somehow makes it legal?
MessageX* foo = new MessageX;
const auto temp = reinterpret_cast<char*>(foo);
auto bar = reinterpret_cast<const MessageJ*>(temp);
My understanding of Jarod42's answer is that the cast to temp
makes it legal.
EDIT:
I've gotten some comments with relation to serialization, alignment, network passing, and so on. That's not what this question is about.
This is a question about strict aliasing.
Strict aliasing is an assumption, made by the C (or C++) compiler, that dereferencing pointers to objects of different types will never refer to the same memory location (i.e. alias eachother.)
What I'm asking is: Will the initialization of a const
object, by casting from a char*
, ever be optimized below where that object is cast to another type of object, such that I am casting from uninitialized data?
So my understanding is that you are doing something like that:
If that is correct, then the expression
msgJ->id
is ok (as would be any access tofoo
), asmsgJ
has the correct dynamic type.msgx->type
on the other hand does incur UB, becausemsgx
has a unrelated type. The fact that the the pointer toMessageJ
was cast toconst char*
in between is completely irrelevant.As was cited by others, here is the relevant part in the standard (the "glvalue" is the result of dereferencing the pointer):
As far as the discussion "cast to
char*
" vs "cast fromchar*
" is concerned:You might know that the standard doesn't talk about strict aliasing as such, it only provides the list above. Strict aliasing is one analysis technique based on that list for compilers to determine which pointers can potentially alias each other. As far as optimizations are concerned, it doesn't make a difference, if a pointer to a
MessageJ
object was cast tochar*
or vice versa. The compiler cannot (without further analysis) assume that achar*
andMessageX*
point to distinct objects and will not perform any optimizations (e.g. reordering) based on that.Of course that doesn't change the fact that accessing a char array via a pointer to a different type would still be UB in C++ (I assume mostly due to alignment issues) and the compiler might perform other optimizations that could ruin your day.
EDIT:
No it will not. Aliasing analysis doesn't influence how the pointer itself is handled, but the access through that pointer. The compiler will NOT reorder the write access (store memory address in the pointer variable) with the read access (copy to other variable / load of address in order to access the memory location) to the same variable.
So long as you mean that it does a reinterpret_cast (or a C-style cast that devolves to a reinterpret_cast):
and later takes that same pointer and reinterpret_cast's it back to the actual underlying type, then that process is completely legitimate:
Note that I specifically dropped the const's from your examples as their presence or absence doesn't matter. The above is legitimate when the underlying object that
foo
points at actually is aMessageJ
, otherwise it is undefined behavior. The reinterpret_cast'ing tochar*
and back again yields the original typed pointer. Indeed, you could reinterpret_cast to a pointer of any type and back again and get the original typed pointer. From this reference:Effectively, reinterpret_cast'ing between pointers of different types simply instructs the compiler to reinterpret the pointer as pointing at a different type. More importantly for your example though, round-tripping back to the original type again and then operating on it is safe. That is because all you've done is instructed the compiler to reinterpret a pointer as pointing at a different type and then told the compiler again to reinterpret that same pointer as pointing back at the original, underlying type.
So, the round trip conversion of your pointers is legitimate, but what about potential aliasing problems?
The strict aliasing rule allows compilers to assume that references (and pointers) to unrelated types do not refer to the same underlying memory. This assumption allows lots of optimizations because it decouples operations on unrelated reference types as being completely independent.
In this example, thanks to the strict aliasing rule, the compiler can assume in
foo
that setting*y
cannot affect the value of*x
. So, it can decide to just return -1 as a constant, for example. Without the strict aliasing rule, the compiler would have to assume that altering*y
might actually change the value of*x
. Therefore, it would have to enforce the given order of operations and reload*x
after setting*y
. In this example it might seem reasonable enough to enforce such paranoia, but in less trivial code doing so will greatly constrain reordering and elimination of operations and force the compiler to reload values much more often.Here are the results on my machine when I compile the above program differently (Apple LLVM v6.0 for x86_64-apple-darwin14.1.0):
In your first example,
foo
is aconst char *
andbar
is aconst MessageJ *
reinterpret_cast'ed fromfoo
. You further stipulate that the object's underlying type actually is aMessageJ
and that no reads are done through theconst char *
. Instead, it is only casted to theconst MessageJ *
from which only reads are then done. Since you do not read nor write through theconst char *
alias, then there can be no aliasing optimization problem with your accesses through your second alias in the first place. This is because there are no potentially conflicting operations performed on the underlying memory through your aliases of unrelated types. However, even if you did read throughfoo
, then there could still be no potential problem as such accesses are allowed by the type aliasing rules (see below) and any ordering of reads throughfoo
orbar
would yield the same results because there are no writes occurring here.Let us now drop the const qualifiers from your example and presume that
MessageServer
does do some write operations onbar
and furthermore that the function also reads throughfoo
for some reason (e.g. - prints a hex dump of memory). Normally, there might be an aliasing problem here as we have reads and writes happening through two pointers to the same memory through unrelated types. However, in this specific example, we are saved by the fact thatfoo
is achar*
, which gets special treatment by the compiler:The strict-aliasing optimizations that are allowed for operations through references (or pointers) of unrelated types are specifically disallowed when a
char
reference (or pointer) is in play. The compiler instead must be paranoid that operations through thechar
reference (or pointer) can affect and be affected by operations done through other references (or pointers). In the modified example where reads and writes operate on bothfoo
andbar
, you can still have defined behavior becausefoo
is achar*
. Therefore, the compiler is not allowed to optimize to reorder or eliminate operations on your two aliases in ways that conflict with the serial execution of the code as written. Similarly, it is forced to be paranoid about reloading values that may have been affected by operations through either alias.The answer to your question is that, so long as your functions are properly round tripping pointers to a type through a
char*
back to its original type, then your function is safe, even if you were to interleave reads (and potentially writes, see caveat at end of EDIT) through thechar*
alias with reads+writes through the underlying type alias.These two technical references (3.10.10) are useful for answering your question. These other references help give a better understanding of the technical information.
====
EDIT: In the comments below, zmb objects that while
char*
can legitimately alias a different type, that the converse is not true as several sources seem to say in varying forms: that thechar*
exception to the strict aliasing rule is an asymmetric, "one-way" rule.Let us modify my above strict-aliasing code example and ask would this new version similarly result in undefined behavior?
I argue that this is defined behavior and that both a and b must be zero after the call to
foo
. From the C++ standard (3.10.10):In the above program, I am accessing the stored value of an object through both its actual type and a char type, so it is defined behavior and the results have to comport with the serial execution of the code as written.
Now, there is no general way for the compiler to always statically know in
foo
that the pointerx
actually aliasesy
or not (e.g. - imagine iffoo
was defined in a library). Maybe the program could detect such aliasing at run time by examining the values of the pointers themselves or consulting RTTI, but the overhead this would incur wouldn't be worth it. Instead, the better way to generally compilefoo
and allow for defined behavior whenx
andy
do happen to alias one another is to always assume that they could (i.e. - disable strict alias optimizations when achar*
is in play).Here's what happens when I compile and run the above program:
This output is at odds with the earlier, similar strict-aliasing program's. This is not dispositive proof that I'm right about the standard, but the different results from the same compiler provides decent evidence that I may be right (or, at least that one important compiler seems to understand the standard the same way).
Let's examine some of the seemingly conflicting sources:
The bolded bit is why this quote doesn't apply to the problem addressed by my answer nor the example I just gave. In both my answer and the example, the aliased memory is being accessed both through a
char*
and the actual type of the object itself, which can be defined behavior.Again, the bolded bit is why this statement doesn't apply to my answers. In this and similar counter-examples, an array of characters is being accessed through a pointer of an unrelated type. Even in C, this is UB because the character array might not be aligned according to the aliased type's requirements, for example. In C++, this is UB because such access does not meet any of the type aliasing rules as the underlying type of the object actually is
char
.In my examples, we first have a valid pointer to a properly constructed type that is then aliased by a
char*
and then reads and writes through these two aliased pointers are interleaved, which can be defined behavior. So, there seems to be some confusion and conflation out there between the strict aliasing exception forchar
and not accessing an underlying object through an incompatible reference.The standard and many examples clearly state that "write through q, then read through p (or value)" can be well defined behavior. What is not as abundantly clear, but what I'm arguing for here, is that "write through p (or value), then read through q" is always well defined. I claim even further, that "reads and writes through p (or value) can be arbitrarily interleaved with reads and writes to q" with well defined behavior.
Now there is one caveat to the previous statement and why I kept sprinkling the word "can" throughout the above text. If you have a type
T
reference and achar
reference that alias the same memory, then arbitrarily interleaving reads+writes on theT
reference with reads on thechar
reference is always well defined. For example, you might do this to repeatedly print out a hex dump of the underlying memory as you modify it multiple times through theT
reference. The standard guarantees that strict aliasing optimizations will not be applied to these interleaved accesses, which otherwise might give you undefined behavior.But what about writes through a
char
reference alias? Well, such writes may or may not be well defined. If a write through thechar
reference violates an invariant of the underlyingT
type, then you can get undefined behavior. If such a write improperly modified the value of aT
member pointer, then you can get undefined behavior. If such a write modified aT
member value to a trap value, then you can get undefined behavior. And so on. However, in other instances, writes through thechar
reference can be completely well defined. Rearranging the endianness of auint32_t
oruint64_t
by reading+writing to them through an aliasedchar
reference is always well defined, for example. So, whether such writes are completely well defined or not depends on the particulars of the writes themselves. Regardless, the standard guarantees that its strict aliasing optimizations will not reorder or eliminate such writes w.r.t. other operations on the aliased memory in a manner that itself could lead to undefined behavior.There is no aliasing problem as you use (
const
)char*
type, see the last point of:The other answer answered the question well enough (it's a direct quotation from the C++ standard in https://isocpp.org/files/papers/N3690.pdf page 75), so I'll just point out other problems in what you're doing.
Note that your code may run into alignment problems. For example, if the alignment of MessageJ is 4 or 8 bytes (typical on 32-bit and 64-bit machines), strictly speaking, it is undefined behaviour to access an arbitrary character array pointer as a MessageJ pointer.
You won't run into any problems on x86/AMD64 architectures as they allow unaligned access. However, someday you may find that the code you're developing is ported to a mobile ARM architecture and the unaligned access would be a problem then.
It therefore seems you're doing something you shouldn't be doing. I would consider using serialization instead of accessing a character array as a MessageJ type. The only problem isn't potential alignment problems, an additional problem is that the data may have a different representation on 32-bit and 64-bit architectures.
First of all, casting pointers does not cause any aliasing violations (although it might cause alignment violations).
Aliasing refers to the process of reading or writing an object through a glvalue of different type than the object.
If an object has type
T
, and we read/write it via aX&
and aY&
then the questions are:X
aliasT
?Y
aliasT
?It does not directly matter whether
X
can aliasY
or vice versa, as you seem to focus on in your question. But, the compiler can infer ifX
andY
are completely incompatible that there is no such typeT
that can be aliased by bothX
andY
, therefore it can assume that the two references refer to different objects.So, to answer your question, it all hinges on what
PopMessage
does. If the code is something like:then it is fine to write:
and so on. The
const
has nothing to do with it. In fact if you did not useconst
here (or you cast it away) then you could also write throughbar
andptr
with no problem.On the other hand, if PopMessage was something like:
then the line
auto baz = *bar;
would cause UB becausechar
cannot be aliased byMessageJ
. Note that you can use placement-new to change the dynamic type of an object (in that case,char buf[200]
is said to have stopped existing, and the new object created by placement-new exists and its type isT
).