I've read this article about C/C++ strict aliasing. I think the same applies to C++.
As I understand, strict aliasing is used to rearrange the code for performance optimization. That's why two pointers of different (and unrelated in C++ case) types cannot refer to the same memory location.
Does this mean that problems can occur only if memory is modified? Apart of possible problems with memory alignment.
For example, handling network protocol, or de-serialization. I have a byte array, dynamically allocated and packet struct is properly aligned. Can I reinterpret_cast
it to my packet struct?
char const* buf = ...; // dynamically allocated
unsigned int i = *reinterpret_cast<unsigned int*>(buf + shift); // [shift] satisfies alignment requirements
The problem here is not strict aliasing so much as structure representation requirements.
First, it is safe to alias between
char
,signed char
, orunsigned char
and any one other type (in your case,unsigned int
. This allows you to write your own memory-copy loops, as long as they're defined using achar
type. This is authorized by the following language in C99 (§6.5):Similar language can be found in the C++0x draft N3242 §3.11/10, although it is not as clear when the 'dynamic type' of an object is assigned (I'd appreciate any further references on what the dynamic type is of a char array, to which a POD object has been copied as a char array with proper alignment).
As such, aliasing is not a problem here. However, a strict reading of the standard indicates that a C++ implementation has a great deal of freedom in choosing a representation of an
unsigned int
.As one random example,
unsigned int
s might be a 24-bit integer, represented in four bytes, with 8 padding bits interspersed; if any of these padding bits does not match a certain (constant) pattern, it is viewed as a trap representation, and dereferencing the pointer will result in a crash. Is this a likely implementation? Perhaps not. But there have been, historically, systems with parity bits and other oddness, and so directly reading from the network into anunsigned int
, by a strict reading of the standard, is not kosher.Now, the problem of padding bits is mostly a theoretical issue on most systems today, but it's worth noting. If you plan to stick to PC hardware, you don't really need to worry about it (but don't forget your
ntohl
s - endianness is still a problem!)Structures make it even worse, of course - alignment representations depend on your platform. I have worked on an embedded platform in which all types have an alignment of 1 - no padding is ever inserted into structures. This can result in inconsistencies when using the same structure definitions on multiple platforms. You can either manually work out the byte offsets for data structure members and reference them directly, or use a compiler-specific alignment directive to control padding.
So you must be careful when directly casting from a network buffer to native types or structures. But the aliasing itself is not a problem in this case.
Actually this code already has UB at the point you dereference the
reinterpret_cast
ed integer pointer without even needing to invoke strict-aliasing rules. Not only that, but if you aren't rather careful, reinterpreting directly to your packet structure could cause all sorts of issues depending on struct packing and endianness.Given all that, and that you're already invoking UB I suspect that it's "likely to work" on multiple compilers and you're free to take that (possibly measurable) risk.