I need to take a C++ vector with potentially a lot of elements, erase duplicates, and sort it.
I currently have the below code, but it doesn't work.
vec.erase(
std::unique(vec.begin(), vec.end()),
vec.end());
std::sort(vec.begin(), vec.end());
How can I correctly do this?
Additionally, is it faster to erase the duplicates first (similar to coded above) or perform the sort first? If I do perform the sort first, is it guaranteed to remain sorted after std::unique
is executed?
Or is there another (perhaps more efficient) way to do all this?
Here's the example of the duplicate delete problem that occurs with std::unique(). On a LINUX machine, the program crashes. Read the comments for details.
If you don't want to modify the vector (erase, sort) then you can use the Newton library, In the algorithm sublibrary there is a function call, copy_single
so You can:
where copy is the vector in where you want to push_back the copy of the unique elements. but remember you push_back the elements, and you don't create a new vector
anyway, this is faster because you don't erase() the elements (which takes a lot of time, except when you pop_back(), because of reassignment)
I make some experiments and it's faster.
Also, you can use:
sometimes is still faster.
You need to sort it before you call
unique
becauseunique
only removes duplicates that are next to each other.edit: 38 seconds...
std::unique
only works on consecutive runs of duplicate elements, so you'd better sort first. However, it is stable, so your vector will remain sorted.This is a function that I created that you can use to delete repeats. The header files needed are just
<iostream>
and<vector>
.I agree with R. Pate and Todd Gardner; a
std::set
might be a good idea here. Even if you're stuck using vectors, if you have enough duplicates, you might be better off creating a set to do the dirty work.Let's compare three approaches:
Just using vector, sort + unique
Convert to set (manually)
Convert to set (using a constructor)
Here's how these perform as the number of duplicates changes:
Summary: when the number of duplicates is large enough, it's actually faster to convert to a set and then dump the data back into a vector.
And for some reason, doing the set conversion manually seems to be faster than using the set constructor -- at least on the toy random data that I used.