Difference between std::remove and erase for vecto

2019-01-30 14:54发布

问题:

I have a doubt that I would like to clarify in my head. I am aware of the different behavior for std::vector between erase and std::remove where the first physically removes an element from the vector, reducing size, and the other just moves an element leaving the capacity the same.

Is this just for efficiency reasons? By using erase, all elements in a std::vector will be shifted by 1, causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy, that difference might matter, right?

回答1:

Is this just for efficiency reason? By using erase all elements in a std::vector will be shifted by 1 causing a large amount of copies; std::remove does just a 'logical' delete and leaves the vector unchanged by moving things around. If the objects are heavy that difference mihgt matter, right?

The reason for using this idiom is exactly that. There is a benefit in performance, but not in the case of a single erasure. Where it does matter is if you need to remove multiple elements from the vector. In this case, the std::remove will copy each not removed element only once to its final location, while the vector::erase approach would move all of the elements from the position to the end multiple times. Consider:

std::vector<int> v{ 1, 2, 3, 4, 5 };
// remove all elements < 5

If you went over the vector removing elements one by one, you would remove the 1, causing copies of the remainder elements that get shifted (4). Then you would remove 2 and shift all remainding elements by one (3)... if you see the pattern this is a O(N^2) algorithm.

In the case of std::remove the algorithm maintains a read and write heads, and iterates over the container. For the first 4 elements the read head will be moved and the element tested, but no element is copied. Only for the fifth element the object would be copied from the last to the first position, and the algorithm will complete with a single copy and returning an iterator to the second position. This is a O(N) algorithm. The later std::vector::erase with the range will cause destruction of all the remainder elements and resizing the container.

As others have mentioned, in the standard library algorithms are applied to iterators, and lack knowledge of the sequence being iterated. This design is more flexible than other approaches on which algorithms are aware of the containers in that a single implementation of the algorithm can be used with any sequence that complies with the iterator requirements. Consider for example, std::remove_copy_if, it can be used even without containers, by using iterators that generate/accept sequences:

std::remove_copy_if(std::istream_iterator<int>(std::cin),
                    std::istream_iterator<int>(),
                    std::ostream_iterator<int>(std::cout, " "),
                    [](int x) { return !(x%2); } // is even
                    );

That single line of code will filter out all even numbers from standard input and dump that to standard output, without requiring the loading of all numbers into memory in a container. This is the advantage of the split, the disadvantage is that the algorithms cannot modify the container itself, only the values referred to by the iterators.



回答2:

std::remove is an algorithm from the STL which is quite container agnostic. It requires some concept, true, but it has been designed to also work with C arrays, which are static in sizes.



回答3:

std::remove simply returns a new end() iterator to point to one past the last non-removed element (the number of items from the returned value to end() will match the number of items to be removed, but there is no guarantee their values are the same as those you were removing - they are in a valid but unspecified state). This is done so that it can work for multiple container types (basically any container type that a ForwardIterator can iterate through).

std::vector::erase actually sets the new end() iterator after adjusting the size. This is because the vector's method actually knows how to handle adjusting it's iterators (the same can be done with std::list::erase, std::deque::erase, etc.).

remove organizes a given container to remove unwanted objects. The container's erase function actually handles the "removing" the way that container needs it to be done. That is why they are separate.



回答4:

I think it has to do with needing direct access to the vector itself to be able to resize it. std::remove only has access to the iterators, so it has no way of telling the vector "Hey, you now have fewer elements".

See yves Baumes answer as to why std::remove is designed this way.



回答5:

Yes, that's the gist of it. Note that erase is also supported by the other standard containers where its performance characteristics are different (e.g. list::erase is O(1)), while std::remove is container-agnostic and works with any type of forward iterator (so it works for e.g. bare arrays as well).



回答6:

Kind of. Algorithms such as remove work on iterators (which are an abstraction to represent an element in a collection) which do not necessarily know which type of collection they are operating on - and therefore cannot call members on the collection to do the actual removal.

This is good because it allows algorithms to work generically on any container and also on ranges that are subsets of the entire collection.

Also, as you say, for performance - it may not be necessary to actually remove (and destroy) the elements if all you need is access to the logical end position to pass on to another algorithm.



回答7:

Standard library algorithms operate on sequences. A sequence is defined by a pair of iterators; the first points at the first element in the sequence, and the second points one-past-the-end of the sequence. That's all; algorithms don't care where the sequence comes from.

Standard library containers hold data values, and provide a pair of iterators that specify a sequence for use by algorithms. They also provide member functions that may be able to do the same operations as an algorithm more efficiently by taking advantage of the internal data structure of the container.



回答8:

Try following code to get better understanding.

std::vector<int> v = {1, 2, 3, 4, 5, 6, 7, 8};
const auto newend (remove(begin(v), end(v), 2));

for(auto a : v){
    cout << a << " ";
}
cout << endl;
v.erase(newend, end(v));
for(auto a : v){
    cout << a << " ";
}


标签: c++ vector stl