-->

On how to recognize Rvalue or Lvalue reference and

2019-01-08 22:33发布

问题:

I was reading Thomas Becker's article on rvalue reference and their use. In there he defines what he calls if-it-has-a-name rule:

Things that are declared as rvalue reference can be lvalues or rvalues. The distinguishing criterion is: if it has a name, then it is an lvalue. Otherwise, it is an rvalue.

This sounds very reasonable to me. It also clearly identifies the rvalueness of an rvalue reference.

My questions are:

  1. Do you agree with this rule? If not, can you give an example where this rule can be violated?
  2. If there are no violations of this rule. Can we use this rule to define rvalueness/lvaluness of an expression?

回答1:

This is one of the most common "rules of thumb" used to explain what is the difference between lvalues and rvalues.

The situation in C++ is much more complex than that so this can't be nothing but a rule of thumb. I'll try to resume a couple of concepts and try to make it clear why this issue is so complex in the C++ world. First let's recap a bit what happened once upon a time

At the beginning there was C

First, what "lvalue" and "rvalue" used to mean originally, in the world of programming languages in general?

In a simpler language like C or Pascal, the terms used to refer to what could be placed at the Left or at the Right of an assignment operator.

In a language like Pascal where the assignment is not an expression but only a statement, the difference is pretty clear and it's defined in grammatical terms. An lvalue is a name of a variable, or a subscript of an array.

That's because only these two things could stand at the left of an assignment:

i := 42; (* ok *)
a[i] := 42; (* ok *)
42 := 42; (* no sense *)

In C, the same difference applies, and it is still pretty much grammatical in the sense that you could look at a line of code and tell if an expression would produce an lvalue or an rvalue.

i = 42; // ok, a variable
*p = 42; // ok, a pointer dereference
a[i] = 42; // ok, a subscript (which is a pointer dereference anyway)
s->var = 42; // ok, a struct member access

So what changed in C++?

Little languages grow up

In C++ things become much more complex and the difference is not grammatical anymore but involves the type checking process, for two reasons:

  • Everything could stay at the left of an assignment, as long as its type has a suitable overload of operator=
  • References

So this means that in C++ you can't say if an expression will produce an lvalue only by looking at its grammatical structure. For example:

f() = g();

is a statement that would have no sense in C but can be perfectly legal in C++ if, for example, f() returns a reference. That's how expressions like v[i] = j work for std::vector: the operator[] returns a reference to the element so you can assign to it.

So what's the point of having a distinction between lvalues and rvalues anymore? The distinction is still relevant for basic types of course, but also to decide what can be bound to a non-const reference.

That's because you don't want to have legal code like:

int &x = 42;
x = 0; // Have we changed the meaning of a natural number??

So the language specifies carefully what is an lvalue and what isn't, and then says that only lvalues can be bound to non-const references. So the above code is not legal because an integer literal is not an lvalue so a non-const reference cannot be bound to it.

Note that const references are different, since they can bind to literals and temporaries (and local references even extend the lifetime of those temporaries):

int const&x = 42; // It's ok

And until now we've only touched what already used to happen in C++98. The rules were already more complex than "if it has a name it's an lvalue", since you have to consider the references. So an expression returning a non-const reference is still considered an lvalue.

Also, other rules of thumb mentioned here already don't work in all cases. For example "if you can take it's address, it's an lvalue". If by "taking the address" you mean "applying operator&", then it might work, but don't trick yourself into thinking that you can't ever come to have the address of a temporary: The this pointer inside a temporary's member function, for example, will point to it.

What changed in C++11

C++11 puts more complexity into the bin by adding the concept of an rvalue reference, that is, a reference that can be bound to an rvalue even if non-const. The fact that it can only be applied to an rvalue make it both safe and useful. I don't think its needed to explain why rvalue reference are useful, so move on.

The point here is that now we have a lot more of cases to consider. So what is an rvalue now? The Standard actually distinguish between different kinds of rvalues to be able to correctly state the behavior of rvalue references and overload resolution and template argument deduction in the presence of rvalue references. So we have terms like xvalue, prvalue and things like that, which make things more complex.

What about our rules of thumb?

So "everything that has a name is an lvalue" can still be true, but for sure it isn't true that every lvalue has a name. A function returning a non-const lvalue reference is an lvalue. A function returning something by value creates a temporary and it is an rvalue, so is a function returning an rvalue reference.

What about "temporaries are rvalues". It's true, but also non-temporaries can be made into rvalues by simply casting the type (as does std::move).

So I think that all these rules are useful if we keep in mind what they are: rules of thumb. They'll always have some corner case where they don't apply, because to exactly specify what an rvalue is and what isn't, we can't avoid using the exact terms and rules used in the standard. That's why they were written for!



回答2:

While the rule covers a majority of case, I can't agree with it in general:

The dereferencing of an anonymous pointer does not have a name, yet it is an lvalue:

foo(*new X);  // Not allowed if foo expects an rvalue reference (example of the article)

Based on the standard, and taking into account the special cases of temporary objects being rvalues, I'd suggest to update the second sentence of the rule :

" ... The criterion is: if it designates a function or an object which is not of temporary nature, then it's an lvalue. ... ".



回答3:

Question 1: That rule is strictly referring to classifying expressions of rvalue reference type, not expressions in general. I almost agree with it in this context ('almost' because there's a bit more to it, see the quote below). The precise wording is in a note in the Standard [Clause 5 paragraph 7]:

In general, the effect of this rule is that named rvalue references are treated as lvalues and unnamed rvalue references to objects are treated as xvalues; rvalue references to functions are treated as lvalues whether named or not.

(emphases mine, for obvious reasons)


Question 2: As you can see from the other answers and comments (some nice examples in there), there are issues with general, concise statements about the value category of an expression. Here's the way I think about it.

We need to look at the problem from the other side: instead of trying to specify what expressions are lvalues, list the kinds that are rvalues; lvalues are everything else.

First, a couple of definitions to keep things clear:

  • An object means a region of storage for data, not a function and not a reference (it's the definition in the Standard).
  • When I say an expression generates something, I mean it doesn't just name it or refer to it, but actually constructs and returns it as the result of a combination of operators, function calls (possibly constructor calls) or casts (possibly implicit casts).

Now, based primarily on [3.10] (but also quite a few other places in the Standard), an expression is an rvalue if and only if it is one of the following:

  1. a value that is not associated with an object (like this, or literals like 7, not string ones);
  2. an expression that generates an object by value, a.k.a. a temporary object;
  3. an expression that generates an rvalue reference to an object;
  4. recursively, one of the following expressions using an rvalue:
    • x.y, where x is an rvalue and y is a non-static member object;
    • x.*y, where x is an rvalue and y is a pointer to a member object;
    • x[y], where either x or y is an rvalue of array type (using the built-in [] operator).

That's it.

Well, technically, the following special cases are also rvalues, but I don't think they're relevant in practice:

  1. a function call returning void, a cast to void, or a throw (obviously not lvalues, I'm not sure why I'd ever be interested in their value category in practice);
  2. one of obj.mf, ptr->mf, obj.*pmf, or ptr->*pmf (mf is a non-static member function, pmf is a pointer to member function); here we're talking strictly about these forms, not the function call expressions that can be built with them, and you really can't do anything with these but make a function call, which is a different expression altogether (to which we need to apply the rules above).

And that's really it. Everything else is an lvalue. I find it easy enough to reason about expressions this way, as all categories above are easily recognizable. For example, it's easy to look at an expression, rule out the cases above, and decide it's an lvalue. Even for category 4, which has a longer description, the expressions are easily recognizable (I tried hard to make it a one-liner, but ultimately failed).

Expressions involving operators can be lvalues or rvalues depending on the exact operator being used. Built-in operators specify what happens in each case, but user-defined operator functions can change the rules. When determining the value category of an expression, both the structure of the expression and the types involved matter.


Notes:

  • Regarding category 1:
    • this in the example refers to this the pointer value, not *this.
    • String literals are lvalues because they're arrays of static storage duration, so they don't fit in category 1 (they're associated with objects).
  • Some examples related to categories 2 and 3:
    • Given the declaration int& f(int), the expression f(7) doesn't generate an object by value, so it doesn't fit in category 2; it does generate a reference, but it's not an rvalue reference, so category 3 doesn't apply either; the expression is an lvalue.
    • Given the declaration int&& f(int), the expression f(7) generates an rvalue reference; category 3 applies here, so the expression is an rvalue.
    • Given the declaration int f(int), the expression f(7) generates an object by value; category 2 applies here, the expression is an rvalue.
    • For casts, we can apply the same reasoning as for the three bullets above.
    • Given the declaration int&& a, using the expression a doesn't generate an rvalue reference; it just uses an identifier of reference type. Category 3 doesn't apply, the expression is an lvalue.
    • Lambda expressions generate closure objects by value - they are in category 2.
  • Some examples related to category 4:
    • x->y is translated to (*x).y. *x is an lvalue (it doesn't fit in any of the categories above). So, if y is a non-static member object, x->y is an lvalue (it doesn't fit in category 4 because of *x and it doesn't fit in 6 because that one only talks about member functions).
    • In x.y, if y is a static member, then category 4 doesn't apply. Such an expression is always an lvalue, even if x is an rvalue (6 doesn't apply either, because it talks about non-static member functions).
    • In x.y, if y is of type T& or T&&, then it's not a member object (remember, objects, not references, not functions), so category 4 doesn't apply. Such an expression is always an lvalue, even if x is an rvalue and even if y is an rvalue reference.
  • Category 4 used to be a bit different in C++11, but I believe this wording is correct for C++14. (If you insist to know, the result of subscripting into an rvalue array used to be an lvalue in C++11, but is an xvalue in C++14 - issue 1213.)
  • Further separating rvalues into xvalues and prvalues is relatively straightforward for C++14: categories 1, 2, 5 and 6 are prvalues, 3 and 4 are xvalues. Things were slightly different for C++11: category 4 was split between prvalues, xvalues and lvalues (changed as noted above, and also as part of the resolution of issue 616). This can be important, as it can affect the type you get back from decltype, for example.

All references are to N4140, the last C++14 draft before publication.

I first found the last two special rvalue cases here (everything's also in the Standard, of course, but harder to find). Note that not everything on that page is accurate for C++14. It also contains a very nice summary on the rationale behind the primary value categories (at the top).