An interview question:
Given a function f(x) that 1/4 times returns 0, 3/4 times returns 1. Write a function g(x) using f(x) that 1/2 times returns 0, 1/2 times returns 1.
My implementation is:
function g(x) = {
if (f(x) == 0){ // 1/4
var s = f(x)
if( s == 1) {// 3/4 * 1/4
return s // 3/16
} else {
g(x)
}
} else { // 3/4
var k = f(x)
if( k == 0) {// 1/4 * 3/4
return k // 3/16
} else {
g(x)
}
}
}
Am I right? What's your solution?(you can use any language)
Your solution is correct, if somewhat inefficient and with more duplicated logic. Here is a Python implementation of the same algorithm in a cleaner form.
If f() is expensive you'd want to get more sophisticated with using the match/mismatch information to try to return with fewer calls to it. Here is the most efficient possible solution.
This takes about 2.6 calls to
g()
on average.The way that it works is this. We're trying to pick a random number from 0 to 1, but we happen to stop as soon as we know whether the number is 0 or 1. We start knowing that the number is in the interval (0, 1). 3/4 of the numbers are in the bottom 3/4 of the interval, and 1/4 are in the top 1/4 of the interval. We decide which based on a call to
f(x)
. This means that we are now in a smaller interval.If we wash, rinse, and repeat enough times we can determine our finite number as precisely as possible, and will have an absolutely equal probability of winding up in any region of the original interval. In particular we have an even probability of winding up bigger than or less than 0.5.
If you wanted you could repeat the idea to generate an endless stream of bits one by one. This is, in fact, provably the most efficient way of generating such a stream, and is the source of the idea of entropy in information theory.
If you call f(x) twice in a row, the following outcomes are possible (assuming that successive calls to f(x) are independent, identically distributed trials):
01 and 10 occur with equal probability. So iterate until you get one of those cases, then return 0 or 1 appropriately:
It might be tempting to call f(x) only once per iteration and keep track of the two most recent values, but that won't work. Suppose the very first roll is 1, with probability 3/4. You'd loop until the first 0, then return 1 (with probability 3/4).
Assuming
and requiring a function
g[x]
with the following assumptionsI believe the following definition of
g[x]
is sufficient (Mathematica)or, alternatively in C
This is based on the idea that invocations of
{f[x], f[x+1]}
would produce the following outcomesSumming each of the outcomes we have
where a sum of 1 represents 1/2 of the possible sum outcomes, with any other sum making up the other 1/2.
Edit. As bdk says - {0,0} is less likely than {1,1} because
However, I am confused myself because given the following definition for
f[x]
(Mathematica)or alternatively in C
then the results obtained from executing
f[x]
andg[x]
seem to have the expected distribution.Since each return of f() represents a 3/4 chance of TRUE, with some algebra we can just properly balance the odds. What we want is another function x() which returns a balancing probability of TRUE, so that
returns true 50% of the time.
So let's find the probability of x (p(x)), given p(f) and our desired total probability (1/2):
So x() should return TRUE with a probability of 2/3, since 2/3 * 3/4 = 6/12 = 1/2;
Thus the following should work for g():
Taking this statement literally, f(x) if called four times will always return zero once and 1 3 times. This is different than saying f(x) is a probabalistic function and the 0 to 1 ratio will approach 1 to 3 (1/4 vs 3/4) over many iterations. If the first interpretation is valid, than the only valid function for f(x) that will meet the criteria regardless of where in the sequence you start from is the sequence 0111 repeating. (or 1011 or 1101 or 1110 which are the same sequence from a different starting point). Given that constraint,
should suffice.
As already mentioned your definition is not that good regarding probability. Usually it means that not only probability is good but
distribution
also. Otherwise you can simply write g(x) which will return 1,0,1,0,1,0,1,0 - it will return them 50/50, but numbers won't be random.Another cheating approach might be:
This solution will be better than all others since it calls
f(x)
only one time. But the results will not be very random.