I have been using the highly concise and intuitive C++ syntax for finding the intersection of two sorted vector
s and putting the result in a third vector
:
vector<bar> a,b,c;
//...
std::set_intersection(a.begin(),a.end(),b.begin(),b.end(),
std::back_inserter(c));
This should set c
to intersection(a
,b
), assuming a
and b
are sorted.
But what if I just use c.begin()
(I thought I saw an example somewhere of this, which is why I did):
std::set_intersection(a.begin(),a.end(),b.begin(),b.end(),
c.begin());
set_intersection
expects an OutputIterator
at that parameter. The standard I believe requires only that c.begin()
return a forward iterator
, which I suppose might or might not be an OutputIterator
.
Anyway, the code with c.begin()
compiled under clang.
What is guaranteed to happen under the standard? If this compiles, what is likely to happen - that is, when the iterator returned by c.begin()
is eventually incremented past the end of the vector, and an attempt is made to access the element pointed to, what must/may happen? Can a conforming implementation silently extend the vector in this case, so that begin()
is in fact an appending OutputIterator
like back_inserter
is?
I'm asking this mainly to understand how the standard works with iterators: what's really going on, so I can move beyond copy-and-paste in using the STL.
The important requirement for an output iterator is that it be valid and write-able for the range
[out, out+
size of output)
.
Passing c.begin()
will lead to the values being overwritten which only works if the container c
holds enough elements to overwrite. Imagine that c.begin()
returns a pointer to an array of size 0 - then you'll see the problem when writing *out++ = 7;
.
back_inserter
adds every assigned value to a vector
(via push_back
) and provides a concise way of making the STL-algorithms extend a range - it overloads the operators that are used for iterators appropriately.
Thus
std::set_intersection(a.begin(),a.end(),b.begin(),b.end(),
c.begin());
invokes undefined behavior once set_intersection
writes something to its output iterator, that is, when the set intersection of a
and b
isn't empty.
Can a conforming implementation silently extend the vector in this case, so that begin() is in fact an appending OutputIterator
like back_inserter
is?
Of course. It's undefined behavior. (This is a humorous approach of telling you that you shouldn't even consider using this, no matter the effects on any implementation.)
back_inserter
inserts the element in the range by calling push_back
( that's why you can't use back_inserter
with the range which doesn't provide push_back
operation).
So, you won't care about going past the end of the range as push_back
automatically expands the container. However, that's not the case with insert using begin()
.
If you are using begin()
, then you have to make sure that destination range is big enough to hold all elements. Failing to do so would instantly transport your code to the realm of undefined behavior.
It compiles fine because you get a valid iterator back from the begin
function, but if the vector is empty then you will get back the end
iterator, and then continue from there.
It will work only if the destination vector already contains at least as many elements as you try to add, and then it will actually overwrite those elements and not add new.
And adding elements is just what the back_inserter
iterator does, it returns an iterator that basically does push_back
on the vector.