i have an array which might contain duplicate elements(more than two duplicates of an element). I wonder if it's possible to find and remove the duplicates in the array:
- without using Hash Table (strict requirement)
- without using a temporary secondary array. No restrictions on complexity.
P.S: This is not Home work question
Was asked to my friend in yahoo technical interview
Sort the source array. Find consecutive elements that are equal. (I.e. what
std::unique
does in C++ land). Total complexity is N lg N, or merely N if the input is already sorted.To remove duplicates, you can copy elements from later in the array over elements earlier in the array also in linear time. Simply keep a pointer to the new logical end of the container, and copy the next distinct element to that new logical end at each step. (Again, exactly like
std::unique
does (In fact, why not just download an implementation ofstd::unique
and do exactly what it does? :P))In-place duplicate removal that preserves the existing order of the list, in quadratic time:
The trick is to start the inner loop on
i + 1
and not increment the inner counter when you remove an element.The code is JavaScript,
splice(x, 1)
removes the element atx
.If order preservation isn't an issue, then you can do it quicker:
Which is linear, unless you count the sort, which you should, so it's of the order of the sort -- in most cases n × log(n).
In functional languages you can combine sorting and unicification (is that a real word?) in one pass. Let's take the standard quick sort algorithm:
If you want only unique entries, replace
left: all elements in xs smaller than or equal to x
with
left: all elements in xs smaller than x
This is a one-pass O(n log n) algorithm.
Example implementation in F#:
And a test in interactive mode:
O(NlogN) : Sort and replace consecutive same element with one copy.
O(N2) : Run nested loop to compare each element with the remaining elements in the array, if duplicate found, swap the duplicate with the element at the end of the array and decrease the array size by 1.
So this is a piece of cake.
Since it's an interview question it is usually expected by the interviewer to be asked precisions about the problem.
With no alternative storage allowed (that is O(1) storage allowed in that you'll probably use some counters / pointers), it seems obvious that a destructive operation is expected, it might be worth pointing it out to the interviewer.
Now the real question is: do you want to preserve the relative order of the elements ? ie is this operation supposed to be stable ?
Stability hugely impact the available algorithms (and thus the complexity).
The most obvious choice is to list Sorting Algorithms, after all, once the data is sorted, it's pretty easy to get unique elements.
But if you want stability, you cannot actually sort the data (since you could not get the "right" order back) and thus I wonder if it solvable in less than O(N**2) if stability is involved.