After reading this interesting question I was reminded of a tricky interview question I had once that I never satisfactorily answered:
You are given an array of n 32-bit unsigned integers where each element (except one) is repeated a multiple of three times. In O(n) time and using as little auxiliary space as possible, find the element of the array that does not appear a multiple of three times.
As an example, given this array:
1 1 2 2 2 3 3 3 3 3 3
We would output 1, while given the array
3 2 1 3 2 1 2 3 1 4 4 4 4
We would output 4.
This can easily be solved in O(n) time and O(n) space by using a hash table to count the frequencies of each element, though I strongly suspect that because the problem statement specifically mentioned that the array contains 32-bit unsigned integers that there is a much better solution (I'm guessing O(1) space).
Does anyone have any ideas on how to solve this?
It can be done in O(n) time and O(1) space.
Here is how you can do it with constant space in C#. I'm using the idea of "xor except with 3-state bits". For every set bit, the "xor" operation increments the corresponding 3-state value.
The final output will be the number whose binary representation has 1s in places that are either 1 or 2 in the final value.
The obvious solution to do it in constant space is to sort it using an in place algorithm, and then scan once over the array.
Sadly this requires usually O(n log n) time and O(1) space.
But as the entries have a limited key length (32 bit) you can use as sort algorithm radix sort (there exist in place radix sort, they are not stable, but that doesnt matter here). There you have O(n) time and O(1) space.
EDIT: Btw you could use this approach to find also ALL numbers that appear not a multiple of 3 times, in case you would allow that more than one number could have this property.
You're looking for an item with a rep-count that's non-zero (mod 3). I think I'd do it recursively:
Without even trying to optimize things beyond the basic algorithm (e.g., not worrying about storing only two bits per count), this seems to do pretty well. I've included code to generate a reasonably large test case (e.g., 1500+ items) and print out the sizes of the maps it's creating. At any given time, it seems to have a maximum of around 50 items in the maps it creates (i.e., it only uses two maps at a time, and the largest I've seen is around 25 items). Technically, as it stands I believe this is currently something like O(N log N), but if you switched to a hash-based container, I believe you'd expect O(N).