Do the C++20
's strict aliasing rules [basic.lval]/11 arbitrarily allow following...
- cast between
char*
andchar8_t*
string str = "string";
u8string u8str { (char8_t*) &*str.data() }; // c++20 u8string
u8string u8str2 = u8"zß水
Do the C++20
's strict aliasing rules [basic.lval]/11 arbitrarily allow following...
char*
and char8_t*
string str = "string";
u8string u8str { (char8_t*) &*str.data() }; // c++20 u8string
u8string u8str2 = u8"zß水
C-style cast is not the same thing as
reinterpret_cast
.The standard sections I think are relevant to your question:
char8_t*-->char*
Yes.Because
char
is one of the types that all objects can be converted to. But the standard does not guarantee that the (dereferenced) converted values are equal for distinct types.char
can be signed or not andchar8_t
is unsigned.char8_t*-->unsigned char*
is valid but should not guarantee that either because it's still distinct. But given that it'schar8_t
's underlying type it should be, I guess?char*-->char8_t*
No.As per 6.7.1.9 those types are distinct. Although there might be argument made that "whose underlying type is unsigned char" part could apply with
unsigned char
being explicitly allowed in 7.2.1.11.3 but I don't think that would be the correct interpretation and being distinct should be the deciding factor. That is supported by the following quote of a comment in the proposal P0482R6 - char8_t: A type for UTF-8 characters and strings (Revision 6 - 2018-11-09) (I did not find more recent revision):uint32_t*<-->char32_t*
,uint16_t*<-->char16_t*
,uint16_t*<-->uint_least16_t*
,uint32_t*<-->uint_least32_t*
,uint_least32_t<-->char32_t
,uint_least16_t<-->char16_t
: No.Those pairs are all distinct, so 7.2.1.11.1 does not apply and neither type is in 7.2.1.11.3 so not even the second part of 2. can be relevant.
unsigned char*-->char8_t*
No.By the same argument as in 2. It's not
T*->T*
cast which is obviously allowed.char8_t*-->unsigned char*
Yes.Because
unsigned char
is too one of the allowed types per 7.2.1.11.3 . But I would still argue that the standard does not guarantee that the (dereferenced) converted values will equal. But given that it's char8_t's underlying type it doesn't have any other options other than to be equal, I guess?Just so we are on the same page, the C-style casts of
(T*) expression
are equivalent toreinterpret_cast<T*>(expression)
([expr.cast]/4.4), which is equivalent tostatic_cast<T*>(static_cast<void*>(expression))
([expr.reinterpret.cast]/7). This does nothing to the value of the pointer, as they are not pointer-interconvertible. (See [expr.static.cast]/13 and [basic.compound]/4).So yes, we would have to look at [basic.lval]/11 to see if it can be aliased. The reference must have a type which is similar to:
Which is not the case. Even though
char8_t
has the underlying type ofunsigned char
, it is not a similar type.So, for example:
Though because of [basic.fundamentals]/6, which says:
You can do
reinterpret_cast<unsigned char*>(pointer-to-char8_t)
and have all the values be equal, but that is the only case (And alsochar*
iffchar
is unsigned, otherwise they may compare unequal (Even for values < 128)). For all other types, you can use this rule tomemcpy
:The
char*_t
line of types do not have any special aliasing rules. Therefore, the standard rules apply. And those rules do not have exceptions for conversion between underlying types.So most of what you did is UB. The one case that isn't UB is
char
due to its special nature. You can in fact read the bytes of achar8_t
as an array ofchar
. But you can't do the opposite, reading the bytes of achar
array aschar8_t
.Now, these types are completely convertible to each other. So you can convert the values in those array to the other type anytime you want.
All that being said, on real implementations those things will almost certainly work. Well, until they don't, because you tried to change one thing through a thing that it's not supposed to be changed by, and the compiler doesn't reload the changed value because it assumed that it couldn't have been changed. So really, just use the correct, meaningful type.