Is the following code valid C++, according to the standard (discounting the ...s)?
bool f(T& r)
{
if(...)
{
r = ...;
return true;
}
return false;
}
T x = (f(x) ? x : T());
It is known to compile in the GCC versions this project uses (4.1.2 and 3.2.3... don't even get me started...), but should it?
Edit: I added some details, for example as to how f() conceptually looks like in the original code. Basically, it's meant to be initialize x in certain conditions.
Syntactically it is, however if you try this
#include <iostream>
using namespace std;
typedef int T;
bool f(T& x)
{
return true;
}
int main()
{
T x = (f(x) ? x : T());
cout << x;
}
it outputs some random junk.
However, if you modify
bool f(T& x)
{
x = 10;
return true;
}
then it outputs 10.
In the first case, the object x
is declared, and the compiler assigns some pseudo-arbitrary value (so you do not initialize it), whereas in the second you specifically assign a value (T()
, i.e. 0
) after the declaration, i.e. you initialize it.
I think your question is similar to this one:
Using newly declared variable in initialization (int x = x+1)?
It undoubtedly should compile, but may conditionally lead to undefined behavior.
- If
T
is a non-primitive type, undefined behavior if it is assigned.
- If
T
is a primitive type, well-defined behavior if it is non-local, and undefined behavior if it is not assigned before reading (except for character types, where it is defined to give an unspecified value).
The relevant part of the Standard is this rule from 3.8, Object lifetime:
The lifetime of an object of type T
begins when:
- storage with the proper alignment and size for type T is obtained, and
- if the object has non-trivial initialization, its initialization is complete.
So the lifetime of x
hasn't started yet. In the same section, we find the rule that governs using x
:
Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways. For an object under construction or destruction, see 12.7. Otherwise, such a glvalue refers to allocated storage (3.7.4.2), and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:
- an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
- the glvalue is used to access a non-static data member or call a non-static member function of the object, or
- the glvalue is bound to a reference to a virtual base class (8.5.3), or
- the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
If your type is non-primitive, then trying to assign it is actually a call to T::operator=
, a non-static member function. Full-stop, that is undefined behavior according to case 2.
Primitive types are assigned without invoking a member function, so let's now take a closer look at section 4.1, Lvalue-to-rvalue conversion, to see when exactly that lvalue-to-rvalue conversion will be undefined behavior:
When an lvalue-to-rvalue conversion occurs in an unevaluated operand or a subexpression thereof (Clause 5) the value contained in the referenced object is not accessed. In all other cases, the result of the conversion is determined according to the following rules:
- If
T
is (possibly cv-qualified) std::nullptr_t
, the result is a null pointer constant (4.10).
- Otherwise, if
T
has a class type, the conversion copy-initializes a temporary of type T
from the glvalue and the result of the conversion is a prvalue for the temporary.
- Otherwise, if the object to which the glvalue refers contains an invalid pointer value (3.7.4.2, 3.7.4.3), the behavior is implementation-defined.
- Otherwise, if
T
is a (possibly cv-qualified) unsigned character type (3.9.1), and the object to which the glvalue refers contains an indeterminate value (5.3.4, 8.5, 12.6.2), and that object does not have automatic storage duration or the glvalue was the operand of a unary &
operator or it was bound to a reference, the result is an unspecified value.
- Otherwise, if the object to which the glvalue refers contains an indeterminate value, the behavior is undefined.
- Otherwise, the value contained in the object indicated by the glvalue is the prvalue result.
(note that these rules reflect a rewrite for the upcoming C++14 standard in order to make them easier to understand, but I don't think there's an actual change in the behavior here)
Your variable x
has1 an indeterminate value at the time an lvalue-reference is made and passed to f()
. As long as that variable has primitive type and its value is assigned before it is read (a read is lvalue-to-rvalue conversion), the code is fine.
If the variable isn't assigned before being read, the effect depends on T
. Character types will cause code that executes and uses an arbitrary but legal character value. All other types cause undefined behavior.
1 Unless x
has static storage duration, for example a global variable. In that case it is zero-initialized before execution, according to section 3.6.2 Initialization of non-local variables:
Variables with static storage duration (3.7.1) or thread storage duration (3.7.2) shall be zero-initialized (8.5) before any other initialization takes place.
In this case of static storage duration it is not possible to run into lvalue-to-rvalue conversion of an unspecified value. But zero-initialization is not a valid state for all types, so still be careful of that.
Although scope plays a role the real issue is about object lifetime and more exactly for object with non-trivial initialization when does the lifetime begin.
This is closely related to Can initializing expression use the variable itself? and Is passing a C++ object into its own constructor legal?. Although my answers to those questions do not neatly answer this question, so it does not seem like a duplicate.
The key portion of the draft C++ standard we are concerned with here is section 3.8
[basic.life] which says:
The lifetime of an object is a runtime property of the object. An object is said to have non-trivial initialization
if it is of a class or aggregate type and it or one of its members is initialized by a constructor other than a trivial
default constructor. [ Note: initialization by a trivial copy/move constructor is non-trivial initialization. —
end note ] The lifetime of an object of type T begins when:
- storage with the proper alignment and size for type T is obtained, and
- if the object has non-trivial initialization, its initialization is complete.
So in this case we satisfy the first bullet, storage has been obtained.
The second bullet is where we find trouble:
- do we have non-trivial initialization
- and if so is the initialization complete
Non-trivial initialization case
We can get a base reasoning from defect report 363 which asks:
And if so, what is the semantics of the self-initialization of UDT?
For example
#include <stdio.h>
struct A {
A() { printf("A::A() %p\n", this); }
A(const A& a) { printf("A::A(const A&) %p %p\n", this, &a); }
~A() { printf("A::~A() %p\n", this); }
};
int main()
{
A a=a;
}
can be compiled and prints:
A::A(const A&) 0253FDD8 0253FDD8
A::~A() 0253FDD8
and the proposed resolution was:
3.8 [basic.life] paragraph 6 indicates that the references here are valid. It's permitted to take the address of a class object before it
is fully initialized, and it's permitted to pass it as an argument to
a reference parameter as long as the reference can bind directly.
[...]
So before the lifetime of an object begins we are limited in what we can do with an object. We can see from the defect report binding a reference to x
is valid as long as it binds directly.
What we can do is covered in section 3.8
(The same section and paragraph the defect report quotes) says (emphasis mine):
Similarly, before the lifetime of an object has started but after the
storage which the object will occupy has been allocated or, after the
lifetime of an object has ended and before the storage which the
object occupied is reused or released, any glvalue that refers to the
original object may be used but only in limited ways. For an object
under construction or destruction, see 12.7. Otherwise, such a glvalue
refers to allocated storage (3.7.4.2), and using the properties of the
glvalue that do not depend on its value is well-defined. The program
has undefined behavior if:
an lvalue-to-rvalue conversion (4.1) is applied to such a glvalue,
the glvalue is used to access a non-static data member or call a non-static member function of the
object, or
the glvalue is bound to a reference to a virtual base class (8.5.3), or
the glvalue is used as the operand of a dynamic_cast (5.2.7) or as the operand of typeid.
In your case we are accessing a non-static data member here, see emphasis above:
r = ...;
So if T
has non-trivial initialization then this line invokes undefined behavior and so would reading from r
which would also be an access, covered in defect report 1531.
If x
has static storage duration it will be zero-initialized but as far as I can tell this does not count as it's initialization is complete since the constructor would be called during dynamic initialization.
Trivial Initialization case
If T
has trivial initializaton then the lifetime begins once storage is obtained and writing to r
is well defined behavior. Although note that reading r
before it has initialized will invoke undefined behavior since it would produce an indeterminate value. If x
has static storage duration then it is zero-initialized and we don't have this issue.
Should it compile, in either cases whether you are invoking undefined behavior or not this allowed to compile. The compiler is not obligated to produce a diagnostic for undefined behavior although it may. It is only obligated to produce a diagnostic for ill-formed code which none of the troublesome cases here are.