可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
In a sequence of length n, where n=2k+3, that is there are k unique numbers appeared twice
and three numbers appeared only once.
The question is: how to find the three unique numbers that appeared only once?
for example, in sequence 1 1 2 6 3 6 5 7 7 the three unique numbers are 2 3 5.
Note:
3<=n<1e6 and the number will range from 1 to 2e9
Memory limits: 1000KB , this implies that we can't store the whole sequence.
Method I have tried(Memory limit exceed):
I initialize a tree, and when read in one number I try to remove it from the tree, if the
remove returns false(not found), I add it to the tree. Finally, the tree has the three numbers. It works, but is Memory limit exceed.
I know how to find one or two such number(s) using bit manipulation. So I wonder if
we can find three using the same method(or some method similar)?
Method to find one/two number(s) appeared only once:
If there is one number appeared only once, we can apply XOR to the sequence to find it.
If there are two, we can first apply XOR to the sequence, then separate the sequence into 2
parts by one bit of the result that is 1, and again apply XOR to the 2 parts, and we will find the answer.
回答1:
You can do this in a similar way to the simpler cases of one and two different values.
We need two integers for each bit of the numbers (e.g. 32 bits). For each number, if that bit is zero, XOR the first integer with it. If it isn't, XOR the second integer with it.
Also, keep count of how many times you find a 1 or 0 in each position (we only need to check if this is even or odd, so keep a boolean).
After iterating through, our pairs of integers will be one of the following. The first number here represents an even count, the second an odd.
0, a^b^c
a^b, c
a^c, b
b^c, a
For each pair, check the even count integer. If it is zero, then we know the other integer is a^b^c, since no two of our results will be equal. Otherwise, we've found a value at the odd count integer.
public static int[] find3(int[] list) {
int[][] xors = new int[32][2];
boolean[] counts = new boolean[32];
for (int curr : list) {
for (int i = 0; i < 32; i++) {
xors[i][(curr & (1 << i)) >> i] ^= curr;
counts[i] ^= ((curr & (1 << i)) == (1 << i));
}
}
// this really shouldn't take so many lines
int[] ret = new int[3];
int found = 0;
for (int i = 0; i < 32; i++) {
int oddCount = xors[i][counts[i] ? 1 : 0];
int evenCount = xors[i][counts[i] ? 0 : 1];
if (evenCount != 0) { // avoid the 0, a^b^c case.
if (found == 0) {
ret[0] = oddCount;// a
ret[2] = evenCount;// b^c for now
found++;
} else if (found == 1 && ret[0] != oddCount) {
ret[1] = oddCount;// b
ret[2] ^= oddCount;// (b^c)^b == c
break;
}
}
}
return ret;
}
回答2:
For a more general version of this problem (without those silly limits):
You can do this in O(n) time and O(1) space without assuming any bounds, or iterating over all the bits, and using only O(1) time bit manipulation tricks like the XOR trick which worked for 2 missing numbers.
Here is (pseudo)code to find just one of the numbers:
// Given an array arr with 2k+3 numbers, k of which are repeated twice
// and the remaining three are distinct: a,b,c.
// returns one of a,b,c.
int FindUnique(int []arr) {
int s = 0; // This will ultimately hold a ^ b ^ c (bitwise XOR)
for (int i = 0; i < arr.Length; i++) {
s ^= arr[i];
}
int d = 0; // this holds diff(a,s) ^ diff(b,s) ^ diff(c,s)
for (int i = 0; i < arr.Length; i++) {
d ^= diff(arr[i],s);
}
int e = lowestBit(d); // This gives the position where one of a,b,c differs
// from the others.
int bucket1 = 0;
int bucket2 = 0;
for (int i = 0; i < arr.Length; i++) {
if (arr[i] & e) {
bucket1 ^= arr[i];
} else {
bucket2 ^= arr[i];
}
}
int count1 = 0;
int count2 = 0;
for (int i = 0; i < arr.Length; i++) {
if (arr[i] == bucket1) {
count1++;
}
if (arr[i] == bucket2) {
count2++;
}
}
if (count1 == 1) return bucket1;
return bucket2;
}
// return a number with the lowest bit of x ^ s set to 1 and rest 0.
// i.e. the lowest bit position where x and s differ.
int diff(int x, int s) {
return lowestBit(x ^ s);
}
// Returns a number with only the lowest bit of y set.
int lowestBit(int y) {
return y & ~(y-1);
}
The idea is as follows:
Say the numbers which appear once are a,b,c.
Now run the XOR through the array to get s = a XOR b XOR c.
Since the numbers are distinct, notice that s cannot be either a or b or c (as the other two will be equal then), thus there is at least one bit (not necessarily at the same position), where each of a,b,c differs from s.
In the two number case, we could see that s is non-zero and pick a bit which differentiated a & b and work with that.
We run into difficulties with that when we have three numbers, but we can still find a bit to differentiate one of the numbers out.
For each number x, find the lowest bit which differs from s. Consider the binary number in which only that bit is set to one and the rest are zero. Call this number diff(x).
Now if we compute diff(x) for each number and XOR them together, we get d = diff(a) XOR diff(b) XOR diff(c).
Notice that d cannot be zero.
Now find the lowest set bit of d. This bit position can be used to bucket out one of a,b,c, as not all of a,b,c can have the same bit at that position: if they did, then that bit of s which is the XOR of those three must be the same, but we ensured that we picked that bit of s to differ from at least one of the corresponding bits in a,b,c.
So we XOR again, differentiating on this bit, and check which of the two resulting numbers appears exactly once in the array. Once we find one number, we know how to deal with two numbers.
To find the diff just use the bithack: x & ~(x-1)
, which is a standard bit-hack and can be considered O(1) (instead of O(number of bits)).
回答3:
This is a classic question - I was actually just asked it a few weeks ago. To solve it, you take the number of possible distinct numbers that could appear, and allocate that many bits.
For example, if numbers in the list must be between 1-20, you allocate 20 bits - one for each number, and you initialize each bit as 0.
Then you traverse the list. Every time you see a number, flip the corresponding bit.
For example: With your example list of 2 6 3 6 5 7 7, we could allocate 7 bits (for 1 2 3 4 5 6 7). Then as we traverse the list, we would do the following:
- flip 2nd bit
- flip 6th bit
- flip 3rd bit
- flip 6th bit
- etc
Then once you're done traversing the list, you can read through the bits to find the three unique numbers. They will all be represented by '1' bits, and the other numbers will be represented by 0s.
You read through the list twice, which takes 2n time, which is O(n).
Edit: It is possible that the bounds will not be given. One solution, then, is to simply read through the list first to determine the bounds yourself - then it's still O(n).
However one problem which could occur is that the list could be very small, but some very large numbers - effectively making the range too big. For example:
1, 99999999999999999, 1, 99999999999999999, 2, 3, 4
Solving that problem would require a lot of memory due to the large number present in the list, because even though there are very few numbers, the range is very large and we are allocating bits according to the range.
The solution could then be adjusted to give a new solution as follows using a hashtable (although I'm not sure if this is permitted given the problem's "bit manipulation only" stipulation):
- Let
L
denote the original list, and C
denote a copy of it.
- Remove all duplicates from
C
(there are numerous ways of doing this efficiently).
- Create a hashtable
H
, and for each element in C
, insert a key/value pair <number
,pos
> into H
where number
is the current element in C
, and pos
is its position in C
. So, given a number that appears in L
, we can now use H
to find that number's position in C
.
- Allocate a number of bits equal to the size of
C
, and initialize those bits to 0.
- Traverse
L
. Each time we run accross a number, get its value from H
, and flip that bit in our bit list.
- Traverse the bit list - for each '1' bit, get the number from
C
which is at that position - that is one of the unique numbers.
回答4:
If a probabilistic solution will suffice then you could use a Bloom Filter.
Create two Bloom filters. The first (A) contains numbers that have been found at least one, and the second (B) contains numbers that have been found twice.
Pseudocode:
A = empty
B = empty
foreach x in the list
if x in A
add x to B
else
add x to A
foreach x in the list
if x in A
if !(x in B)
print x
If you use the full 1000KB then the probability of error would be ridiculously low.
回答5:
The problem gets harder and harder as you add more unique values, mainly because you can choose A,B,C such that A xor B xor C = 0. It gets harder and harder to detect if a subset of the values has the same checksum because it contains all the unique values, or because it omitted values which happened to xor to 0.
You can do 3 values in constant space and O(n*k) time, where k is the number of bits in the largest integer. (So O(n) time for your typical case: 32-bit integers.)
It would be interesting to find out if the time bound becomes non-linear in N as the number of unique values increases and you continue to require constant space.
//Special check for 0, because otherwise we don't know A xor B xor C != A xor B
if items unique-contains 0 then
return 0 ++ SubProblem2Unique(items - 0)
//Compute A xor B xor C
val x = fold xor items
//Try to find a split which separates A and B from C.
for i in 0..WORD_SIZE
//see if the checksum splits
val x1 = fold xor [e in items where e & (1<<i) == 0]
val x2 = x xor x1
if x1 == x or x2 == x then continue //ith bit was the same for A and B and C
//C is either x1 or x2
val C = if items unique-contains x1 then x1 else x2
return C ++ SubProblem2Unique(items - C)
throw InvalidInput
回答6:
Why not use a a hashset?
- If a number already exists, delete from hashset
- if a number does not exist, put into hashset
The final hashset contains only unique numbers.
Time: O(n)
Memory:o(k) where k is number of distinct elements.
With hashset approach, the solution is scalable and can be used to determine any number of unique elements in any given sequence.