We have 2 binary strings: X and Y, in 2 different computers. Both of them in length n. The computers can communicate by sending bits to each other.
We have to build randomized algorithm to check whether there's an index i such that Xi=Yi. We can send only O(log n) bits between the computers.
If there's no such index the algorithm will always return "false". If there's such index, the algorithm will return "true" in probability 0.5 (at least).
Any suggestions?
If the bits are independent sending any log(n)
will give you the same chance that you 'hit' the equal bits. You will not be able to improve this without any additional information.
To elaborate a bit on Ivaylo's answer:
imagine the two strings are
A = 110110....0....00010
B = 001001....0....11101
both are of some large length n, and Ak = Bk for a single k, somewhere in the middle.
You basically want functions that transform A, or B, such that f(A) and g(B) are O(log n)
bit numbers. E.g. sum is such a function.
Say you sum the bits of A, i.e. f = sum
. Also let g = sum . xor
.
So if A was 110110 0 00010 (12 bits)
and B was 001001 0 11101 (12 bits)
, then f(A) = 5 / 101 (3 bits)
and g(B) = 6 / 110 (3 bits)
. You can compare them and since they are different so you can say "Aha! Then the numbers must share a bit! (there must be i, s.t. Ai = Bi)" and you will be right. However, while this is enough evidence, it is not necessary true when the answer should be true. In other words: there could be i s.t. Ai = Bi, but f(A) = g(B).
Lets look closer to the functions to see why. f(A) really counts how many ones there are in A, g(B) counts how many zeroes there are in B. Assuming that if they are the same then A XOR B = 0, is the same as saying "any number that has as many zeros as there are ones in another number results in 0 when XOR-ed with that other number." Which is false: 100 and 110 fulfill the condition but 100 XOR 110 is 010.
Now you can say: "Well, we just need to pick better f and g." However, the reason sum didn't work is fundamental and you cannot get away from it: f and g are hash-functions, or in maths language - surjective functions. The domain has size of O(n) bits or O(2^n) elements, while the codomain (target set) has size of O(log n) bits or O(2 ^ (log n)) = O(n) elements and O(2^n) > O(n).
Surjective functions cannot be inverted (which is what you actually want). Any time you invert f(A) or g(B) you get one-to-many mapping. If f(A) is 2 and A has 3 bits, then A could be {110, 101, 011}. The size of the inverse image of f(A) would be, on average, O(2^n / n). With no further information the chance of you guessing the value of A is O(n / 2^n) < 0.5 in the general case.
And you have no further information, because if you did, you could incorporate it in f and g, but that would increase the size of their codomain.
I suggest reading up on information theory for further understanding of information loss, entropy, etc.
for stanm (I wrote it as an answer because comment was to long):
It's a correct solution. The full algorithm is:
k = number of 1's in X.
Send k to computer 2.
l = number of 0's in Y.
If k=l computer 2 will answer "no", else "yes" (or 0 and 1).
If there's no index i such that Xi=Yi, so the algorithms will always answer "no" (or 0).
If such index exists. The probability for computer 2 for wrong answer is the probability that computer 2 will get l=k.
The number of all binary strings (length n) that contains k 0's is (n choose k).
The number of all binary strings (length n) is (2^n).
So the probability that computer 2 will fail even though it has to return "yes" is (n choose k)/2^n. You can prove that this number is always less than (or equal to) 1/2.
So finally we can conclude that:
If such an index doesn't exists computer 2 will answer "no". If it exists, so the probability that computer 2 will fail is less than (or equal to) 1/2, and therefore it will answer "yes" in probability more than 1/2.