I was reading about strict aliasing, but its still kinda foggy and I am never sure where is the line of defined / undefined behaviour. The most detailed post i found concentrates on C. So it would be nice if you could tell me if this is allowed and what has changed since C++98/11/...
#include <iostream>
#include <cstring>
template <typename T> T transform(T t);
struct my_buffer {
char data[128];
unsigned pos;
my_buffer() : pos(0) {}
void rewind() { pos = 0; }
template <typename T> void push_via_pointer_cast(const T& t) {
*reinterpret_cast<T*>(&data[pos]) = transform(t);
pos += sizeof(T);
}
template <typename T> void pop_via_pointer_cast(T& t) {
t = transform( *reinterpret_cast<T*>(&data[pos]) );
pos += sizeof(T);
}
};
// actually do some real transformation here (and actually also needs an inverse)
// ie this restricts allowed types for T
template<> int transform<int>(int x) { return x; }
template<> double transform<double>(double x) { return x; }
int main() {
my_buffer b;
b.push_via_pointer_cast(1);
b.push_via_pointer_cast(2.0);
b.rewind();
int x;
double y;
b.pop_via_pointer_cast(x);
b.pop_via_pointer_cast(y);
std::cout << x << " " << y << '\n';
}
Please dont pay too much attention to a possible out-of-bounds access and the fact that maybe there is no need to write something like that. I know that char*
is allowed to point to anything, but I also have a T*
that points to a char*
. And maybe there is something else I am missing.
Here is a complete example also including push/pop via memcpy
, which afaik isn't affected by strict aliasing.
TL;DR: Does the above code exhibit undefined behaviour (neglecting a out-of-bound acces for the moment), if yes, why? Did anything change with C++11 or one of the newer standards?
Short answer:
You may not do this:
*reinterpret_cast<T*>(&data[pos]) =
until there has been an object of typeT
constructed at the pointed-to address. Which you can accomplish by placement new.Even then, you might need to use
std::launder
as for C++17 and later, since you access the created object (of typeT
) through a pointer&data[pos]
of typechar*
."Direct"
reinterpret_cast
is allowed only in some special cases, e.g., whenT
isstd::byte
,char
, orunsigned char
.Before C++17 I would use the
memcpy
-based solution. Compiler will likely optimize away any unnecessary copies.Right, and that is a problem. While the pointer cast itself has defined behaviour, using it to access a non-existent object of type
T
is not.Unlike C, C++ does not allow impromptu creation of objects*. You cannot simply assign to some memory location as type
T
and have an object of that type be created, you need an object of that type to be there already. This requires placementnew
. Previous standards were ambiguous on it, but currently, per [intro.object]:Since you are not doing any of these things, no object is created.
Furthermore, C++ does not implicitly consider pointers to different object at the same address as equivalent. Your
&data[pos]
computes a pointer to achar
object. Casting it toT*
does not make it point to anyT
object residing at that address, and dereferencing that pointer has undefined behaviour. C++17 addsstd::launder
, which is a way to let the compiler know that you want to access a different object at that address than what you have a pointer to.When you modify your code to use placement
new
andstd::launder
, and ensure you have no misaligned accesses (I presume you left that out for brevity), your code will have defined behaviour.* There is discussion on allowing this in a future version of C++.
Aliasing is a situation when two refer to the same object. That might be references or pointers.
It's important for compiler to expect that if a value was written using one name it would be accessible through another.
Now if pointers are of unrelated types, there is no reason for compiler to expect that they point at same address. This is the simplest UB:
Simply put, strict aliasing means that compiler expects names of unrelated types refer to object of different type, thus located in separate storage units. Because addresses used to access those storage units are de-facto same, result of accessing stored value is undefined and usually depends on optimization flags.
memcpy()
circumvents that by taking the address, by pointer to char, and makes copy of data stored, within code of library function.Strict aliasing applies to union members, which described separately, but reason is same: writing to one member of union doesn't guarantee the values of other members to change. That doesn't apply to shared fields in beginning of struct stored within union. Thus, type punning by union is prohibited. (Most compilers do not honor this for historical reasons and convenience of maintaining legacy code.)
From 2017 Standard: 6.10 Lvalues and rvalues
In 7.5
Outcome is: while you can reinterpret_cast the pointer to a different, unrelated and not similar type, you can't use that pointer to access stored value:
Reinterpret cast also doesn't create objects they point to and assigning value to non-existing object is UB, so you can't use dereferenced result of cast to store data either if class it points to wasn't trivial.