What's the most efficient way to erase duplica

I need to take a C++ vector with potentially a lot of elements, erase duplicates, and sort it.

I currently have the below code, but it doesn't work.

vec.erase(
      std::unique(vec.begin(), vec.end()),
      vec.end());
std::sort(vec.begin(), vec.end());

How can I correctly do this?

Additionally, is it faster to erase the duplicates first (similar to coded above) or perform the sort first? If I do perform the sort first, is it guaranteed to remain sorted after std::unique is executed?

Or is there another (perhaps more efficient) way to do all this?

标签： c++ sorting vector stl duplicates

20条回答

不再属于我。

2楼-- · 2018-12-31 09:28

If you do not want to change the order of elements, then you can try this solution:

template <class T>
void RemoveDuplicatesInVector(std::vector<T> & vec)
{
    set<T> values;
    vec.erase(std::remove_if(vec.begin(), vec.end(), [&](const T & value) { return !values.insert(value).second; }), vec.end());
}

0人赞添加讨论(0) 举报

无色无味的生活

3楼-- · 2018-12-31 09:30

sort(v.begin(), v.end()), v.erase(unique(v.begin(), v,end()), v.end());

0人赞添加讨论(0) 举报

栀子花@的思念

4楼-- · 2018-12-31 09:32

Efficiency is a complicated concept. There's time vs. space considerations, as well as general measurements (where you only get vague answers such as O(n)) vs. specific ones (e.g. bubble sort can be much faster than quicksort, depending on input characteristics).

If you have relatively few duplicates, then sort followed by unique and erase seems the way to go. If you had relatively many duplicates, creating a set from the vector and letting it do the heavy lifting could easily beat it.

Don't just concentrate on time efficiency either. Sort+unique+erase operates in O(1) space, while the set construction operates in O(n) space. And neither directly lends itself to a map-reduce parallelization (for really huge datasets).

0人赞添加讨论(0) 举报

千与千寻千般痛.

5楼-- · 2018-12-31 09:33

Here's a template to do it for you:

template<typename T>
void removeDuplicates(std::vector<T>& vec)
{
    std::sort(vec.begin(), vec.end());
    vec.erase(std::unique(vec.begin(), vec.end()), vec.end());
}

call it like:

removeDuplicates<int>(vectorname);

0人赞添加讨论(0) 举报

流年柔荑漫光年

6楼-- · 2018-12-31 09:33

unique only removes consecutive duplicate elements (which is necessary for it to run in linear time), so you should perform the sort first. It will remain sorted after the call to unique.

0人赞添加讨论(0) 举报

梦该遗忘

7楼-- · 2018-12-31 09:33

As already stated, unique requires a sorted container. Additionally, unique doesn't actually remove elements from the container. Instead, they are copied to the end, unique returns an iterator pointing to the first such duplicate element, and you are expected to call erase to actually remove the elements.

0人赞添加讨论(0) 举报

1 2 3 4 下一页

What's the most efficient way to erase duplica

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间