According to this stackoverflow answer about C++11/14 strict alias rules:
If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
the dynamic type of the object,
a cv-qualified version of the dynamic type of the object,
- a type similar (as defined in 4.4) to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to the dynamic type of the object,
- a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type of the object,
- an aggregate or union type that includes one of the aforementioned types among its elements or non-static data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
- a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
- a
char
orunsigned char
type.
can we access the storage of other type using
(1) char *
(2) char(&)[N]
(3) std::array<char, N> &
without depending on undefined behavior?
constexpr uint64_t lil_endian = 0x65'6e'64'69'61'6e;
// a.k.a. Clockwise-Rotated Endian which allocates like
// char[8] = { n,a,i,d,n,e,\0,\0 }
const auto& arr = // std::array<char,8> &
reinterpret_cast<const std::array<char,8> &> (lil_endian);
const auto& carr = // char(&)[8]>
reinterpret_cast<const char(&)[8]> (lil_endian);
const auto* p = // char *
reinterpret_cast<const char *>(std::addressof(lil_endian));
int main()
{
const auto str1 = std::string(arr.crbegin()+2, arr.crend() );
const auto str2 = std::string(std::crbegin(carr)+2, std::crend(carr) );
const auto sv3r = std::string_view(p, 8);
const auto str3 = std::string(sv3r.crbegin()+2, sv3r.crend() );
auto lam = [](const auto& str) {
std::cout << str << '\n'
<< str.size() << '\n' << '\n' << std::hex;
for (const auto ch : str) {
std::cout << ch << " : " << static_cast<uint32_t>(ch) << '\n';
}
std::cout << '\n' << '\n' << std::dec;
};
lam(str1);
lam(str2);
lam(str3);
}
all lambda invocations produce:
endian
6
e : 65
n : 6e
d : 64
i : 69
a : 61
n : 6e
godbolt.org/g/cdDTAM (enable -fstrict-aliasing -Wstrict-aliasing=2 )
The strict aliasing rule is in fact very simple: Two objects with overlapping lifetime cannot have overlapping storage region if one is not a suboject of the other.(*)
Nevertheless, it is allowed to read the memory representation of an object. The memory representation of an object is a sequence of
unsigned char
[basic.types]/4:Accordingly in your example:
lam(str1)
is UB (Undefined Behavior);lam(str2)
is UB (an array and its first element are not pointer interconvertible);lam(str3)
is not stated as UB in the standard, if you replacechar
byunsigned char
one could argue that you are reading the object representation. (it is not defined either, but it should work on all compilers)So using the third case and changing the declaration of
p
toconst unsigned char*
should always produce the expected result. For the other 2 cases, it can work with this simple example, but may break if the code is more complicated or on newer compiler version.(*) There are two exception to this rule: one for unions' members with common initialization sequence; and one for array of
unsigned char
orstd::byte
that provides storage for an other object.The
char(&)[N]
case andstd::array<char, N>
case both result in undefined behavior. The reason has already been block-quoted by you. Note neitherchar(&)[N]
norstd::array<char, N>
is the same type aschar
.I am not sure of the
char
case, because the current standard does not explicitly say that an object can be viewed as an array of narrow characters (see here for further discussion).Anyway, if you want to access the underlying bytes of an object, use
std::memcpy
, as the standards explicitly says in [basic.types]/2: