I have an array of integers that I need to remove duplicates from while maintaining the order of the first occurrence of each integer. I can see doing it like this, but imagine there is a better way that makes use of STL algorithms better? The insertion is out of my control, so I cannot check for duplicates before inserting.
int unsortedRemoveDuplicates(std::vector<int> &numbers) {
std::set<int> uniqueNumbers;
std::vector<int>::iterator allItr = numbers.begin();
std::vector<int>::iterator unique = allItr;
std::vector<int>::iterator endItr = numbers.end();
for (; allItr != endItr; ++allItr) {
const bool isUnique = uniqueNumbers.insert(*allItr).second;
if (isUnique) {
*unique = *allItr;
++unique;
}
}
const int duplicates = endItr - unique;
numbers.erase(unique, endItr);
return duplicates;
}
How can this be done using STL algorithms?
Here is what WilliamKF is searching for. It uses the erase statement. This code is good for lists but isn t good for vectors. For vectors you should not use the erase statement.
What I have tryed for vectors without using sort is that:
Somehow it looks like this would work. I tested it a bit maybe can somebody tell if this really works ! This solution doesn t need any greater operator. I mean why use the greater operator for seaching unique elements ? Usage for Vectors:
Here's something that handles POD and non-POD types with move support. Uses default operator== or a custom equality predicate. Does not require sorting/operator<, key generation, or a separate set. No idea if this is more efficient than the other methods described above.
Usage/tests:
Here is a c++11 generic version that works with iterators and doesn't allocate additional storage. It may have the disadvantage of being O(n^2) but is likely faster for smaller input sizes.
....
Sample Code: http://cpp.sh/5kg5n
The naive way is to use
std::set
as everyone tells you. It's overkill and has poor cache locality (slow).The smart* way is to use
std::vector
appropriately (make sure to see footnote at bottom):Then you can use it like:
*Note: That's the smart way in typical cases, where the number of duplicates isn't too high. For a more thorough performance analysis, see this related answer to a related question.
Sounds like a job for std::copy_if. Define a predicate that keeps track of elements that already have been processed and return false if they have.
If you don't have C++11 support, you can use the clumsily named std::remove_copy_if and invert the logic.
This is an untested example:
Then
where an
std::ref
has been used to avoid potential problems with the algorithm internally copying what is a stateful functor, althoughstd::copy_if
does not place any requirements on side-effects of the functor being applied.Fast and simple, C++11: