C++03 5.1 Primary expressions
§2:
A literal is a primary expression. Its type depends on its form (2.13). A string literal is an lvalue; all other literals are rvalues.
What is the rationale behind this?
As I understand, string literals are objects, while all other literals are not.And an l-value always refers to an object.
But the question then is why are string literals objects while all other literals are not?
This rationale seems to me more like an egg or chicken problem.
I understand the answer to this may be related to hardware architecture rather than C/C++ as programming languages, nevertheless I would like to hear the same.
Note: I am tagging this question as c & c++ both because C99 standard also has similar quotations, specifically §6.5.1.4
A string literal is a literal with array type, and in C there is no way for an array type to exist in an expression except as an lvalue. String literals could have been specified to have pointer type (rather than array type that usually decays to a pointer) pointing to the string "contents", but this would make them rather less useful; in particular, the sizeof
operator could not be applied to them.
Note that C99 introduced compound literals, which are also lvalues, so having a literal be an lvalue is no longer a special exception; it's closer to being the norm.
String literals are arrays - objects of inherently unpredictable size (i.e of user-defined and possibly large size). In general case, there's simply no other way to represent such literals except as objects in memory, i.e. as lvalues
. In C99 this also applies to compound literals, which are also lvalues
.
Any attempts to artificially hide the fact that string literals are lvalues
at the language level would produce a considerable number of completely unnecessary difficulties, since the ability to point to a string literal with a pointer as well as the ability to access it as an array relies critically on its lvalue-ness being visible at the language level.
Meanwhile, literals of scalar types have fixed compile-time size. At the same time, such literals are very likely to be embedded directly into the machine commands on the given hardware architecture. For example, when you write something like i = i * 5 + 2
, the literal values 5
and 2
become explicit (or even implicit) parts of the generated machine code. They don't exist and don't need to exist as standalone locations in data storage. There's simply no point in storing values 5
and 2
in the data memory.
It is also worth noting that on many (if not most, or all) hardware architectures floating-point literals are actually implemented as "hidden" lvalues
(even though the language does not expose them as such). On platforms like x86 machine commands from floating-point group do not support embedded immediate operands. This means that virtually every floating-point literal has to be stored in (and read from) data memory by the compiler. E.g. when you write something like i = i * 5.5 + 2.1
it is translated into something like
const double unnamed_double_5_5 = 5.5;
const double unnamed_double_2_1 = 2.1;
i = i * unnamed_double_5_5 + unnamed_double_2_1;
In other words, floating-point literals
often end up becoming "unofficial" lvalues
internally. However, it makes perfect sense that language specification did not make any attempts to expose this implementation detail. At language level, arithmetic literals
make more sense as rvalues
.
An lvalue
in C++ does not always refer to an object. It can refer to a function too. Moreover, objects do not have to be referred to by lvalues
. They may be referred to by rvalues
, including for arrays (in C++ and C). However, in old C89, the array to pointer conversion did not apply for rvalues
arrays.
Now, an rvalue
denotes no, limited or soon to be an expired lifetime. A string literal, however, lives for the entire program.
So string literals
being lvalues
is exactly right.
I'd guess that the original motive was mainly a pragmatic one: a string
literal must reside in memory and have an address. The type of a string
literal is an array type (char[]
in C, char const[]
in C++), and
array types convert to pointers in most contexts. The language could
have found other ways to define this (e.g. a string literal could have
pointer type to begin with, with special rules concerning what it
pointed to), but just making the literal an lvalue is probably the
easiest way of defining what is concretely needed.