Why does push_back or push_front invalidate a dequ

2019-01-23 11:55发布

问题:

As the title asks.

My understanding of a deque was that it allocated "blocks". I don't see how allocating more space invalidates iterators, and if anything, one would think that a deque's iterators would have more guarantees than a vector's, not less.

回答1:

The C++ standard doesn't specify how deque is implemented. It isn't required to allocate new space by allocating a new chunk and chaining it on to the previous ones, all that's required is that insertion at each end be amortized constant time.

So, while it's easy to see how to implement deque such that it gives the guarantee you want[*], that's not the only way to do it.

[*] Iterators have a reference to an element, plus a reference to the block it's in so that they can continue forward/back off the ends of the block when they reach them. Plus I suppose a reference to the deque itself, so that operator+ can be constant-time as expected for random-access iterators -- following a chain of links from block to block isn't good enough.



回答2:

What's more interesting is that push_back and push_front will not invalidate any references to a deque's elements. Only iterators are to be assumed invalid.

The standard, to my knowledge, doesn't state why. However if an iterator were implemented that was aware of its immediate neighbors - as a list is - that iterator would become invalid if it pointed to an element that was both at the edge of the deque and the edge of a block.



回答3:

My guess. push_back/push_front can allocate a new memory block. A deque iterator must know when increment/decrement operator should jump into the next block. The implementation may store that information in iterator itself. Incrementing/decrementing an old iterator after push_back/push_front may not work as intended.

This code may or may not fail with run time error. On my Visual Studio it failed in debug mode but run to the conclusion in release mode. On Linux it caused segmentation fault.

#include <iostream>
#include <deque>

int main() {
    std::deque<int> x(1), y(1);
    std::deque<int>::iterator iterx = x.begin();
    std::deque<int>::iterator itery = y.begin();

    for (int i=1; i<1000000; ++i) {
        x.push_back(i);
        y.push_back(i);
        ++iterx;
        ++itery;
        if(*iterx != *itery) {
            std::cout << "increment failed at " << i << '\n';
            break;
        }
    }
}


回答4:

The key thing is not to make any assumptions just treat the iterator as if it will be invalidated.

Even if it works fine now, a later version of the compiler or the compiler for a different platform might come along and break your code. Alternatively, a colleague might come along and decide to turn your deque into a vector or linked list.



回答5:

Even when you are allocating in chunks, an insert will cause that particular chunk to be reallocated if there isn't enough space (as is the case with vectors).



回答6:

An iterator is not just a reference to the data. It must know how to increment, etc.

In order to support random access, implementations will have a dynamic array of pointers to the chunks. The deque iterator will point into this dynamic array. When the deque grows, a new chunk might need to be allocated. The dynamic array will grow, invalidating its iterators and, consequently, the deque's iterators.

So it is not that chunks are reallocated, but the array of pointers to these chunks can be. Indeed, as Johannes Schaub noted, references are not invalidated.

Also note that the deque's iterator guarantees are not less than the vector's, which are also invalidated when the container grows.



回答7:

Because the standard says it can. It does not mandate that deque be implemented as a list of chunks. It mandates a particular interface with particular pre and post conditions and particular algorithmic complexity minimums.

Implementors are free to implement the thing in whatever way they choose, so long as it meets all of those requirements. A sensible implementation might use lists of chunks, or it might use some other technique with different trade-offs.

It's probably impossible to say that one technique is strictly better than another for all users in all situations. Which is why the standard gives implementors some freedom to choose.