Best way to erase vector of ranges from std::vecto

2019-06-04 08:27发布

问题:

In one of my projects it is necessary to remove certain elements from a std::vector<double> values. The indices I have to remove are given as a vector of intervals. For example {1,3} means, that I have to remove indices from 1 to 3 inclusive from values.

I can assume that the intervals given are mutually exclusive.

The code shown below illustrates, what the desired behavior should look like.

#include <iostream>
#include <vector>

int main(int argc, char** args) {
    // Intervals of indices I have to remove from values
    std::vector<std::pair<int, int>> intervals = { {1,3},{7,9},{13,13} }; 

    // Vector of arbitrary values. 
    std::vector<double> values = {4.2,6.4,2.3,3.4,9.1,2.3,0.6,1.2,0.3,0.4,6.4,3.6,1.4,2.5,7.5 }
    removeIntervals(values, intervals);
    // intervals should contain 4.2,9.1,2.3,0.6,6.4,3.6,1.4,7.5
}

What might be the shortest amount of code necessary to achieve this?

My best solution so far is:

 void removeIntervals(std::vector<double>& values, const std::vector < std::pair<int, int>>& intervals) {
    std::vector<bool> flags(values.size(), true);
    std::vector<double> ret;
    for (auto interval : intervals) {
        std:fill(flags.begin() + interval.first, flags.begin()+interval.second+1, false);
    }
    for (auto i = 0; i < values.size(); i++) {
        if (flags[i]) ret.push_back(values[i]);
    }
    values = ret;
 }

I can assume, that my intervals are non-overlapping and consecutive. It seems, that it boils down to perform erase from back to front.

void removeIntervals2(std::vector<double>& values, const std::vector < std::pair<int, int>>& intervals) {
    auto revIntervals = intervals;
    std::reverse(revIntervals.begin(), revIntervals.end());
    for (auto interval : revIntervals) {
        values.erase(std::begin(values) + interval.first, std::begin(values) + interval.second + 1);
    }
}

回答1:

Since you can assume that the intervals don't overlap and are increasing order, the solution is to start at the back (so that the indexes don't change) and remove each range in turn:

So for the minimum amount of code, which you asked for:

for (auto& it = intervals.rbegin(); it != intervals.rend(); ++it) {
  values.erase(values.begin() + it->first, std::next(values.begin() + it->second));

The down side of this is that this will involve lots of shuffling of the vector. Really what you would want to do is swap the last unswapped item at the end of the vector, with the item you want to remove, and then resize when you're finished to cut off the end; but this requires more code.



回答2:

The problem is non-trivial since after a first call to vector::erase() all indices/iterators to elements past the first erased one are invalidated, including further interval to be removed.

Therefore, using vector::erase() has to be done in descending order of the elements to be erased.

Another inconvenience originates from using int indices instead of iterators for the interval boundaries. Finally, vector::erase() copies (ore moves) all elements past the last removed elements to fill the gap. This preserves the order of values, but causes excessive copying (moving) in case of several intervals.

A more efficient way is to swap only the elements to be removed and finally down-size the vector.



回答3:

What you surely want is a solution with not only short code but also good efficiency, minimizing the copies and shifts in the vector of values.

I would definitely go with the first part of your solution, that is falgging the positions to be kept or deleted.

std::vector<bool> flags(values.size(), true);
for (auto interval : intervals) {
    std:fill(flags.begin() + interval.first, flags.begin()+interval.second+1, false);
}

For the second part, the shortest and most efficient would be the erase/remove_if idiom:

 values.erase(std::remove_if(begin(values), end(values),
    [&](const auto& v) { return !flags[&v - &(*values.begin())];}),
  values.end());

The efficiency here is due to that remove_if will first mark the elements that need to be removed, then it will compact the vector by putting first the elements to stay and returning the position of the first element to remove. Finally, the erase will shrink the vector. From an algorithmic point of view, this solution is likely optimal. It should pay out for large vectors.



回答4:

Thought I'd post an answer that was a little more fault tolerant. If your intervals are larger than the input array for example if intervals had included {15, 15} this will still function correctly. Additionally this is faster than UKMonkey's solution because it does all the work in a single pass:

It has come to my attention that this code is implementation defined, and only works on Clang and Visual Studio 2015 Update 3:

values.resize(distance(begin(values), remove_if(begin(values), end(values), [i = 0U, it = cbegin(intervals), end = cend(intervals)](const auto&) mutable { return it != end && ++i > it->first && (i <= it->second || (++it, true)); })));

Live Example

You can accomplish the same thing in a for-loop though:

size_t write = 0U;
auto it = cbegin(intervals);

for (size_t read = 0U; read < size(values); ++read) {
    if (it == cend(intervals) || read < it->first) {
        values[write++] = values[read];
    } else if (read == it->second) {
        ++it;
    }
}

values.resize(write);

Live Example

If you are hooked on "the shortest amount of code necessary to achieve this," you can use my evil , from the lambda in the for-loop too:

for (size_t read = 0U; read < size(values); ++read) if (it == cend(intervals) || read < it->first || (read == it->second && (++it, false))) values[write++] = values[read];


回答5:

Well, the answers so far are all bad -- either making whole new vectors or requiring O(N^2) time -- so I'll add this one.

Instead of erasing the elements you don't want to keep, and shifting the rest every time, you move the ones you do want to keep into the proper position, and then just truncate the vector.

O(N) time and no extra space:

void removeIntervals(std::vector<double>& values, const std::vector < std::pair<int, int>>& intervals) {
    if (intervals.size()<=0)
        return;

    //keep the part before the first interval
    auto dest = values.begin()+intervals[0].first;

    for (size_t i=0; i<intervals.size(); ++i) {

        //copy the part to keep after each interval
        auto s = values.cbegin()+intervals[i].second+1;
        auto e = (i+i >= intervals.size() ?
                  values.cend() : 
                  values.cbegin()+intervals[i+1].first);
        while(s<e) {
            *dest++=*s++;
        }
    }
    values.erase(dest,values.end());
 }