Given int foo[] = {0, 1, 2, 3};
I want to know if iterators that point past the "one past-the-end" are invalid. For example: auto bar = cend(foo) + 1;
There are a ton of complaints and warnings that this is "undefined behavior" in Stack Overflow questions like this: c++ what's the result of iterator + integer when past-end-iterator? Unfortunately the only source is hand waving.
I'm having more and more trouble buying that, for example:
int* bar;
Is uninitialized, but certainly does not invoke undefined behavior, and given enough tries I'm sure I could find an instance where the value in this uninitialized bar
had the same value as cend(foo) + 1
.
One of the big confusions here is that I am not asking about dereferencing cend(foo) + 1
. I know that would be undefined behavior and the standard forbids it. But answers like this: https://stackoverflow.com/a/33675281/2642059 which cite only that dereferencing such an iterator is illegal do not answer the question.
I also know that C++ only guarantees that cend(foo)
will be valid, but it could be numeric_limits<int*>::max()
, in which case cend(foo) + 1
would overflow. I'm not interested in that case unless it is called out in the standard as the reason we can't have an iterator past the "one past-the-end". I know that int*
really just holds an integer value, and as such is subject to overflow.
I would like a citation from a credible source that moving an iterator beyond the "one past-the-end" element is undefined behavior.
Yes, your program has undefined behaviour if you form such a pointer.
That's because the only way you can do so is to increment a valid pointer past the bounds of the object it points inside, and that is an undefined operation.
[C++14: 5.7/5]:
When an expression that has integral type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integral expression. In other words, if the expression P
points to the i-th element of an array object, the expressions (P)+N
equivalently, N+(P)
) and (P)-N
(where N
has the value n) point to, respectively, the i + n-th and i − n-th elements of the array object, provided they exist. Moreover, if the expression P
points to the last element of an array object, the expression (P)+1
points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q)-1
points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined.
An uninitialised pointer is not the same thing because you never did anything to "get" that pointer, other than declaring it (which is obviously valid). But you can't even evaluate it (not dereference — evaluate) without imbuing your program with undefined behaviour. Not until you've assigned it a valid value.
As a sidenote, I would not call these "past-the-end" iterators/pointers, a term in C++ which specifically means the "one past-the-end" iterator/pointer, which is valid (e.g. cend(foo)
itself). You're waaaay past the end. ;)
TL;DR -- It is undefined behavior to compute an iterator past the one-past-the-end iterator because a precondition is violated in the process.
Lightness provided the quote that authoritatively covers pointers.
For iterators, incrementing past the "end" (one-past-the-last-element) is not prohibited generally, but it IS prohibited for most of the various kinds of iterators:
The input iterator requirements, and the only incrementable if dereferenceable clause in particular, are incorporated by reference into forward, bidirectional, and random-access iterators.
Output iterators are not so constrained, they are always incrementable. Because there is no end, iterators past-the-one-past-the-end are excluded by definition, so worrying about whether they would be legal to compute is moot.
Then, jumping forward in the sequence is defined in terms of individual incrementation, so we conclude that computation of a past-the-one-past-the-end iterator is either meaningless or illegal for all iterator types.
I'm not interested in that case unless it is called out in the standard as the reason we can't have an iterator past the "one past-the-end". I know that int* really just holds an integer value, and as such is subject to overflow.
The standard doesn't discuss reasons for making things undefined. You've got the logic backwards: The fact that it's undefined is the reason that an implementation may put an object in a location where doing such a thing would otherwise cause an overflow. If a "two-past-the-end" iterator were required to be valid, then implementations would be required not to put an object somewhere that might cause such an operation to overflow.
As @Random842 said so well:
The standard does not describe pointer types as being in a flat linear space with a minimum and a maximum and everything between being valid, as you seem to assume they are
Pointers are not presumed to exist in a flat linear space. Instead, there are valid pointers, and invalid pointers. Some operations on pointers are defined, others are undefined behavior.
On many modern systems, pointers are implemented in a flat linear space. Even on these systems, the undefinedness of forming some pointers can open your C++ compiler to some optimizations; for example, int foo[5]; bool test(int* it1) { int* it2 = cend(foo); return it1 <= it2; }
can be optimized to true
as there are no pointers that can be validly compared to it2
that are not less than or equal to it.
In less contrived situations (like some loops) this could save cycles on every loop.
It is unlikely that the pointer model was developed with this in mind. There are pointer implementations that are not in a flat linear space.
Segmented memory is the most well known. In old x86 systems, each pointer is a pair of 16 bit values. The location they refer to in the linear 20 bit address space is high << 4 + low
, or segment << 4 + offset
.
Objects live within a segment, and have a constant segment value. This means all defined pointer <
comparisons can simply compare offset
, the low 16 bits. They don't have to do that math (which, at the time was expensive), they can discard the high 16 bits and compare the offset values when ordering.
There are other architectures, where code exists on a parallel address space to data (so comparing code pointers to data pointers can return spurious equality).
The rules are pretty simple. Can create pointers to elements in arrays, and to the one-past-the-end (this means that the segmented memory system cannot build arrays that reach the very end of the segment).
Now, your memory isn't segmented, so this isn't your problem, right? The compiler is free to interpret your forming of ptr+2
along a certain code branch to mean that ptr
is not a pointer to the last element of an array, and optimize accordingly. If that isn't true, your code could behave in unexpected ways.
And there are instances of real compilers in the wild using techniques like that (assuming code does not use undefined behavior, proving invariants from it, using conclusions to change behavior before the undefined behavior occurs), if not that particular case. Undefined behavior can time travel, even if the underlying hardware implementation "would have no problem with it" without any optimizations.