I need to take a C++ vector with potentially a lot of elements, erase duplicates, and sort it.
I currently have the below code, but it doesn't work.
vec.erase(
std::unique(vec.begin(), vec.end()),
vec.end());
std::sort(vec.begin(), vec.end());
How can I correctly do this?
Additionally, is it faster to erase the duplicates first (similar to coded above) or perform the sort first? If I do perform the sort first, is it guaranteed to remain sorted after std::unique
is executed?
Or is there another (perhaps more efficient) way to do all this?
If you do not want to change the order of elements, then you can try this solution:
sort(v.begin(), v.end()), v.erase(unique(v.begin(), v,end()), v.end());
Efficiency is a complicated concept. There's time vs. space considerations, as well as general measurements (where you only get vague answers such as O(n)) vs. specific ones (e.g. bubble sort can be much faster than quicksort, depending on input characteristics).
If you have relatively few duplicates, then sort followed by unique and erase seems the way to go. If you had relatively many duplicates, creating a set from the vector and letting it do the heavy lifting could easily beat it.
Don't just concentrate on time efficiency either. Sort+unique+erase operates in O(1) space, while the set construction operates in O(n) space. And neither directly lends itself to a map-reduce parallelization (for really huge datasets).
Here's a template to do it for you:
call it like:
unique
only removes consecutive duplicate elements (which is necessary for it to run in linear time), so you should perform the sort first. It will remain sorted after the call tounique
.As already stated,
unique
requires a sorted container. Additionally,unique
doesn't actually remove elements from the container. Instead, they are copied to the end,unique
returns an iterator pointing to the first such duplicate element, and you are expected to callerase
to actually remove the elements.