What's the most efficient way to erase duplica

2018-12-31 09:19发布

I need to take a C++ vector with potentially a lot of elements, erase duplicates, and sort it.

I currently have the below code, but it doesn't work.

vec.erase(
      std::unique(vec.begin(), vec.end()),
      vec.end());
std::sort(vec.begin(), vec.end());

How can I correctly do this?

Additionally, is it faster to erase the duplicates first (similar to coded above) or perform the sort first? If I do perform the sort first, is it guaranteed to remain sorted after std::unique is executed?

Or is there another (perhaps more efficient) way to do all this?

20条回答
不再属于我。
2楼-- · 2018-12-31 09:28

If you do not want to change the order of elements, then you can try this solution:

template <class T>
void RemoveDuplicatesInVector(std::vector<T> & vec)
{
    set<T> values;
    vec.erase(std::remove_if(vec.begin(), vec.end(), [&](const T & value) { return !values.insert(value).second; }), vec.end());
}
查看更多
无色无味的生活
3楼-- · 2018-12-31 09:30

sort(v.begin(), v.end()), v.erase(unique(v.begin(), v,end()), v.end());

查看更多
栀子花@的思念
4楼-- · 2018-12-31 09:32

Efficiency is a complicated concept. There's time vs. space considerations, as well as general measurements (where you only get vague answers such as O(n)) vs. specific ones (e.g. bubble sort can be much faster than quicksort, depending on input characteristics).

If you have relatively few duplicates, then sort followed by unique and erase seems the way to go. If you had relatively many duplicates, creating a set from the vector and letting it do the heavy lifting could easily beat it.

Don't just concentrate on time efficiency either. Sort+unique+erase operates in O(1) space, while the set construction operates in O(n) space. And neither directly lends itself to a map-reduce parallelization (for really huge datasets).

查看更多
千与千寻千般痛.
5楼-- · 2018-12-31 09:33

Here's a template to do it for you:

template<typename T>
void removeDuplicates(std::vector<T>& vec)
{
    std::sort(vec.begin(), vec.end());
    vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}

call it like:

removeDuplicates<int>(vectorname);
查看更多
流年柔荑漫光年
6楼-- · 2018-12-31 09:33

unique only removes consecutive duplicate elements (which is necessary for it to run in linear time), so you should perform the sort first. It will remain sorted after the call to unique.

查看更多
梦该遗忘
7楼-- · 2018-12-31 09:33

As already stated, unique requires a sorted container. Additionally, unique doesn't actually remove elements from the container. Instead, they are copied to the end, unique returns an iterator pointing to the first such duplicate element, and you are expected to call erase to actually remove the elements.

查看更多
登录 后发表回答