The apparent underspecification of one-past-the-en

2020-03-24 07:50发布

问题:

It has been asked before in various forms, but since the language specification appears to be quite dynamic in this regard (or at least was dynamic when some SO discussions of this matter took place), it might make sense to revisit the matter in light of any more recent developments, if any exist.

So, the question is, again, whether a combination of & and subscript is a valid way to obtain a pointer to the imaginary past-the-end element of an array

int a[42] = {};
&a[42];

It was considered undefined in C++98. But what about modern C++? We have seen DR#232, which nevertheless is still in "drafting" state for some reason and definitely not in the standard text (as of C++14). Is the matter still hanging in the air or has it been resolved by some alternative means?

What is interesting is that DR#315 seem to openly permit calling non-static member functions through a null pointer p (!) on the basis that "*p is not an error when p is null unless the lvalue is converted to an rvalue". It feels like the resolution of DR#315 was tentatively based on the supposedly slam-dunk future resolution of DR#232, but the latter failed to materialize. In that light, is DR#315 really a NAD?

Also, since C++11 the library specification defines dereferenceable iterators simply as iterators for which *it expression is valid, which in case of std::vector would/might largely delegate the matter to the above issue for raw arrays, and apparently open the door for dererenceable std::vector::end() iterators. This potentially makes the following code valid

std::vector<int> v(42);
&v[42];

Is it really valid? Some older answers on SO categorically state that dereferencing standard end() iterators is always undefined. But it does not appear to be so clear-cut in post-C++11 versions of the language. The standard says that the library implementation "never assumes" end-iterators to be dereferenceable, which means that they are not unconditionally non-dereferenceable anymore.

P.S. I have already seen this discussion Lvalues which do not designate objects in C++14, but it seems to be focused specifically on the validity of reference initialization, which I don't want to bring here.

回答1:

To the best of my understanding you are dereferencing it in the &v[42] (or &a[42]) expression and it is undefined.

Basing on N4140:

[expr.unary.op]/1

The unary * operator performs indirection : the expression to which it is applied shall be a pointer to an object type, or a pointer to a function type and the result is an lvalue referring to the object or function to which the expression points.

I don't think the non-element past the last element of an array is considered an object.



回答2:

My best guess:

Except where it has been declared for a class (13.5.5), the subscript operator [] is interpreted in such a way that E1[E2] is identical to *((E1)+(E2)).

a[42] is equal to a *(a + 42)

[§ 5.7 Additive operators]

When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. ...

... If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.

So a + 42 seems to return a valid T* pointer, which would dereference into a T (per [expr.unary.op])

If the type of the expression is “pointer to T,” the type of the result is “T.”

There is also the following note:

[§ 3.9.2 Compound types]

Note: For instance, the address one past the end of an array (5.7) would be considered to point to an unrelated object of the array’s element type that might be located at that address.

It seems like it is valid. I still think assigning to it would be undefined behavior (due to the note, that it is an unrelated object), but getting the address appears to be defined.

That being said, &a[41] + 1 is defined (thanks to 5.7) and avoids this completely, maybe just do that.