Problem with invalidation of STL iterators when ca

2019-04-09 14:57发布

问题:

The STL standard defines that when an erase occurs on containers such as std::deque, std::list etc iterators are invalidated.

My question is as follows, assuming the list of integers contained in a std::deque, and a pair of indicies indicating a range of elements in the std::deque, what is the correct way to delete all even elements?

So far I have the following, however the problem here is that the assumed end is invalidated after an erase:

#include <cstddef>
#include <deque>

int main()
{
   std::deque<int> deq;
   for (int i = 0; i < 100; deq.push_back(i++));

   // range, 11th to 51st element
   std::pair<std::size_t,std::size_t> r(10,50);

   std::deque<int>::iterator it = deq.begin() + r.first;
   std::deque<int>::iterator end = deq.begin() + r.second;

   while (it != end)
   {
      if (*it % 2 == 0)
      {
         it = deq.erase(it);
      }
      else
        ++it;
   }

   return 0;
}

Examining how std::remove_if is implemented, it seems there is a very costly copy/shift down process going on.

  • Is there a more efficient way of achieving the above without all the copy/shifts

  • In general is deleting/erasing an element more expensive than swapping it with the next nth value in the sequence (where n is the number of elements deleted/removed so far)

Note: Answers should assume the sequence size is quite large, +1mil elements and that on average 1/3 of elements would be up for erasure.

回答1:

I'd use the Erase-Remove Idiom. I think the Wikipedia article linked even shows what you're doing -- removing odd elements.

The copying that remove_if does is no more costly than what happens when you delete elements from the middle of the container. It might even be more efficient.



回答2:

Calling .erase() also results in "a very costly copy/shift down process going on.". When you erase an element from the middle of the container, every other element after that point must be shifted down one spot into the available space. If you erase multiple elements, you incur that cost for every erased element. Some of the non-erased elements will move several spots, but are forced to move one spot at a time instead of all at once. That is very inefficient.

The standard library algorithms std::remove and std::remove_if optimize this work. They use a clever trick to ensure that every moved element is only moved once. This is much, much faster than what you are doing yourself, contrary to your intuition.

The pseudocode is like this:

read_location <- beginning of range.
write_location <- beginning of range.
while read_location != end of range:
    if the element at read_location should be kept in the container:
        copy the element at the read_location to the write_location.
        increment the write_location.
    increment the read_location.

As you can see, every element in the original sequence is considered exactly once, and if it needs to be kept, it gets copied exactly once, to the current write_location. It will never be looked at again, because the write_location can never run in front of the read_location.



回答3:

Remember that deque is a contiguous memory container (like vector, and probably sharing implementation), so removing elements mid-container necessarily means copying subsequent elements over the hole. You just want to make sure you're doing one iteration and copying each not-to-be-deleted object directly to its final position, rather than moving all objects one by one during each delete. remove_if is efficient and appropriate in this regard, your erase loop is not: it does massive amounts of unnecessary copying.

FWIW - alternatives:

  • add a "deleted" state to your objects and mark them deleted in place, but then every time you operate on the container you'll need to check yourself
  • use a list, which is implemented using pointers to previous and next elements, such that removing a list element alters the adjacent points to bypass that element: no copying, efficient iteration, just no random access, more small (i.e. inefficient) heap allocations and pointer overheads

What to choose depends on the nature, relative frequency, and performance requirements of specific operations (e.g. it may be that you can afford slow removals if they're done at non-critical times, but need fastest-possible iteration - whatever it is, make sure you understand your needs and the implications of the various data structures).



回答4:

One alternative that hasn't been mentioned is to create a new deque, copy the elements that you want to keep into it, and swap it with the old deque.

void filter(std::deque<int>& in, std::pair<std::size_t,std::size_t> range) {
    std::deque<int> out;
    std::deque<int>::const_iterator first = in.begin();
    std::deque<int>::const_iterator curr = first + range.first;
    std::deque<int>::const_iterator last = first + range.second;
    out.reserve(in.size() - (range.second-range.first));
    std::copy(first, curr, std::back_inserter(out));
    while (curr != last) {
        if (*curr & 1) {
            out.push_back(*curr);
        }
        ++curr;
    }
    std::copy(last, in.end(), std::back_inserter(out));
    in.swap(out);
}

I'm not sure if you have enough memory to create a copy, but it usually is faster and easier to make a copy instead of trying to inline erase elements from a large collection. If you still see memory thrashing, then figure out how many elements you are going to keep by calling std::count_if and reserve that many. This way you would have a single memory allocation.