C++ std::vector<>::iterator is not a pointer, w

2019-04-18 04:05发布

站内文章 / C++

11 0

闹够了就滚

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Just a little introduction, with simple words. In C++, iterators are "things" on which you can write at least the dereference operator *it, the increment operator ++it, and for more advanced bidirectional iterators, the decrement --it, and last but not least, for random access iterators we need operator index it[] and possibly addition and subtraction.

Such "things" in C++ are objects of types with the according operator overloads, or plain and simple pointers.

std::vector<> is a container class that wraps a continuous array, so pointer as iterator makes sense. On the nets, and in some literature you can find vector.begin() used as a pointer.

The rationale for using a pointer is less overhead, higher performance, especially if an optimizing compiler detects iteration and does its thing (vector instructions and stuff). Using iterators might be harder for the compiler to optimize.

Knowing this, my question is why modern STL implementations, let's say MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector iterators?

回答1:

You're completely correct that vector::iterator could be implemented by a simple pointer (see here) -- in fact the concept of an iterator is based on that of a pointer to an array element. For other containers, such as map, list, or deque, however, a pointer won't work at all. So why is this not done? Here are three reasons why a class implementation is preferrable over a raw pointer.

Implementing an iterator as separate type allows additional functionality (beyond what is required by the standard), for example (added in edit following Quentins comment) the possibility to add assertions when dereferencing an iterator, for example, in debug mode.
overload resolution If the iterator were a pointer T*, it could be passed as valid argument to a function taking T*, while this would not be possible with an iterator type. Thus making std::vector<>::iterator a pointer in fact changes the behaviour of existing code. Consider, for example,
```
template<typename It>
void foo(It begin, It end);
void foo(const double*a, const double*b, size_t n=0);

std::vector<double> vec;
foo(vec.begin(), vec.end());    // which foo is called?
```
argument-dependent lookup (ADL; pointed out by juanchopanza) If you make an unqualified call, ADL ensures that functions in namespace std will be searched only if the arguments are types defined in namespace std. So,
```
std::vector<double> vec;
sort(vec.begin(), vec.end());             // calls std::sort
sort(vec.data(), vec.data()+vec.size());  // fails to compile
```
std::sort is not found if vector<>::iterator were a mere pointer.

回答2:

The implementation of the iterator is implementation defined, so long as fulfills the requirements of the standard. It could be a pointer for vector, that would work. There are several reasons for not using a pointer;

consistency with other containers.
debug and error checking support
overload resolution, class based iterators allow for overloads to work differentiating them from plain pointers

If all the iterators were pointers, then ++it on a map would not increment it to the next element since the memory is not required to be not-contiguous. Past the contiguous memory of std:::vector most standard containers require "smarter" pointers - hence iterators.

The physical requirement's of the iterator dove-tail very well with the logical requirement that movement between elements it a well defined "idiom" of iterating over them, not just moving to the next memory location.

This was one of the original design requirements and goals of the STL; the orthogonal relationship between the containers, the algorithms and connecting the two through the iterators.

Now that they are classes, you can add a whole host of error checking and sanity checks to debug code (and then remove it for more optimised release code).

Given the positive aspects class based iterators bring, why should or should you not just use pointers for std::vector iterators - consistency. Early implementations of std::vector did indeed use plain pointers, you can use them for vector. Once you have to use classes for the other iterators, given the positives they bring, applying that to vector becomes a good idea.

回答3:

The rationale for using a pointer is less overhead, higher performance, especially if an optimizing compiler detects iteration and does its thing (vector instructions and stuff). Using iterators might be harder for the compiler to optimize.

It might be, but it isn't. If your implementation is not utter shite, a struct wrapping a pointer will achieve the same speed.

With that in mind, it's simple to see that simple benefits like better diagnostic messages (naming the iterator instead of T*), better overload resolution, ADL, and debug checking make the struct a clear winner over the pointer. The raw pointer has no advantages.

回答4:

Because STL was designed with the idea that you can write something that iterates over an iterator, no matter whether that iterator's just equivalent to a pointer to an element of memory-contiguous arrays (like std::array or std::vector) or something like a linked list, a set of keys, something that gets generated on the fly on access etc.

Also, don't be fooled: In the vector case, dereferencing might (without debug options) just break down to a inlinable pointer dereference, so there wouldn't even be overhead after compilation!

回答5:

The rationale for using a pointer is less overhead, higher performance, especially if an optimizing compiler detects iteration and does its thing (vector instructions and stuff). Using iterators might be harder for the compiler to optimize.

This is the misunderstanding at the heart of the question. A well formed class implementation will have no overhead, and identical performance all because the compiler can optimize away the abstraction and treat the iterator class as just a pointer in the case of std::vector.

That said,

MSVC++ 2013 or libstdc++ in Mingw 4.7, use a special class for vector iterators

because they view that adding a layer of abstraction class iterator to define the concept of iteration over a std::vector is more beneficial than using an ordinary pointer for this purpose.

Abstractions have a different set of costs vs benefits, typically added design complexity (not necessarily related to performance or overhead) in exchange for flexibility, future proofing, hiding implementation details. The above compilers decided this added complexity is an appropriate cost to pay for the benefits of having an abstraction.

回答6:

I got around this pesky obstacle by dereferencing and immediately referencing the iterator again. It looks ridiculous, but it satisfies MSVC...

class Thing {
  . . .
};

void handleThing(Thing* thing) {
  // do stuff
}

vector<Thing> vec;
// put some elements into vec now

for (auto it = vec.begin(); it != vec.end(); ++it)
  // handleThing(it);   // this doesn't work, would have been elegant ..
  handleThing(&*it);    // this DOES work