Which code of these has UB (specifically, which violates strict aliasing rule)?
void a() {
std::vector<char> v(sizeof(float));
float *f = reinterpret_cast<float *>(v.data());
*f = 42;
}
void b() {
char *a = new char[sizeof(float)];
float *f = reinterpret_cast<float *>(a);
*f = 42;
}
void c() {
char *a = new char[sizeof(float)];
float *f = new(a) float;
*f = 42;
}
void d() {
char *a = (char*)malloc(sizeof(float));
float *f = reinterpret_cast<float *>(a);
*f = 42;
}
void e() {
char *a = (char*)operator new(sizeof(float));
float *f = reinterpret_cast<float *>(a);
*f = 42;
}
I ask this, because of this question.
I think, that d
doesn't have UB (or else malloc
would be useless in C++). And because of this, it seems logical, that b
, c
& e
doesn't have it either. Am I wrong somewhere? Maybe b
is UB, but c
is not?
Preamble: storage and objects are different concepts in C++. Storage refers to memory space, and objects are entities with lifetimes, that may be created and destroyed within a piece of storage. Storage may be re-used for hosting multiple objects over time. All objects require storage, but there can be storage with no objects in it.
c is correct. Placement-new is one of the valid methods of creating an object in storage (C++14 [intro.object]/1), even if there were pre-existing objects in that storage. The old objects are implicitly destroyed by the re-use of the storage, and this is perfectly fine so long as they did not have non-trivial destructors ([basic.life]/4). new(a) float;
creates an object of type float
and dynamic storage duration within the existing storage ([expr.new]/1).
d and e are undefined by omission in the current object model rules: the effect of accessing memory via a glvalue expression is only defined when that expression refers to an object; and not for when the expression refers to storage containing no objects. (Note: please do not leave non-constructive comments regarding the obvious inadequacy of the existing definitions).
This does not mean "malloc is useless"; the effect of malloc
and operator new
is to obtain storage. Then you can create objects in the storage and use those objects. This is in fact exactly how standard allocators, and the new
expression, work.
a and b are strict aliasing violations: a glvalue of type float
is used to access objects of incompatible type char
. ([basic.lval]/10)
There is a proposal which would make all of the cases well-defined (other than the alignment of a mentioned below): under this proposal, using *f
implicitly creates an object of that type in the location, with some caveats.
Note: There is no alignment problem in cases b through e, because the new-expression and ::operator new
are guaranteed to allocate storage correctly aligned for any type ([new.delete.single]/1).
However, in the case of std::vector<char>
, even though the standard specifies that ::operator new
be called to obtain storage, the standard doesn't require that the first vector element be placed in the first byte of that storage; e.g. the vector could decide to allocate 3 extra bytes on the front and use those for some book-keeping.
Even though it's a discussion between the OP and I that spawned this question, I'll still put my interpretation here.
I believe that all of these save for c()
contain strict aliasing violations as formally defined by the standard.
I base this on section 1.8.1 of the standard
... An object is created by a definition (3.1), by a new-expression (5.3.4)
or by the implementation (12.2) when needed. ...
reinterpret_cast<>
ing memory does not fall under either of these cases.
From cppreference:
Type aliasing
Whenever an attempt is made to read or modify the stored value of an
object of type DynamicType through a glvalue of type AliasedType, the
behavior is undefined unless one of the following is true:
- AliasedType and DynamicType are similar.
- AliasedType is the (possibly cv-qualified) signed or unsigned variant of DynamicType.
- AliasedType is std::byte, (since C++17)char, or unsigned char: this permits examination of the object representation of any object as
an array of bytes.
Informally, two types are similar if, after stripping away cv-qualifications at
every level (but excluding anything inside a function type), they are the same
type.
For example: [...some examples...]
Also cppreference:
a glvalue is an expression whose evaluation determines the identity of
an object, bit-field, or function;
The above is relevant for all example except (c). Types are neither similar nor signed/unsigned variants. Also the AliasedType
(the type you cast to) is neither of char
, unsigned char
or std::byte
. Hence all of them (but c) exhibit undefined behaviour.
Disclaimer: First of all cppreference is not an official reference, but only the standard is. Secondly, unfortunately I am not even 100% certain if my interpretation of what I read on cppreference is correct.