I once got the following as an interview question:
I'm thinking of a positive integer n. Come up with an algorithm that can guess it in O(lg n) queries. Each query is a number of your choosing, and I will answer either "lower," "higher," or "correct."
This problem can be solved by a modified binary search, in which you listing powers of two until you find one that exceeds n, then run a standard binary search over that range. What I think is so cool about this is that you can search an infinite space for a particular number faster than just brute-force.
The question I have, though, is a slight modification of this problem. Instead of picking a positive integer, suppose that I pick an arbitrary rational number between zero and one. My question is: what algorithm can you use to most efficiently determine which rational number I've picked?
Right now, the best solution I have can find p/q in at most O(q) time by implicitly walking the Stern-Brocot tree, a binary search tree over all the rationals. However, I was hoping to get a runtime closer to the runtime that we got for the integer case, maybe something like O(lg (p + q)) or O(lg pq). Does anyone know of a way to get this sort of runtime?
I initially considered using a standard binary search of the interval [0, 1], but this will only find rational numbers with a non-repeating binary representation, which misses almost all of the rationals. I also thought about using some other way of enumerating the rationals, but I can't seem to find a way to search this space given just greater/equal/less comparisons.
Okay, here's my answer using continued fractions alone.
First let's get some terminology here.
Let X = p/q be the unknown fraction.
Let Q(X,p/q) = sign(X - p/q) be the query function: if it is 0, we've guessed the number, and if it's +/- 1 that tells us the sign of our error.
The conventional notation for continued fractions is A = [a0; a1, a2, a3, ... ak]
= a0 + 1/(a1 + 1/(a2 + 1/(a3 + 1/( ... + 1/ak) ... )))
We'll follow the following algorithm for 0 < p/q < 1.
Initialize Y = 0 = [ 0 ], Z = 1 = [ 1 ], k = 0.
Outer loop: The preconditions are that:
Y and Z are continued fractions of k+1 terms which are identical except in the last element, where they differ by 1, so that Y = [y0; y1, y2, y3, ... yk] and Z = [y0; y1, y2, y3, ... yk + 1]
(-1)k(Y-X) < 0 < (-1)k(Z-X), or in simpler terms, for k even, Y < X < Z and for k odd, Z < X < Y.
Extend the degree of the continued fraction by 1 step without changing the values of the numbers. In general, if the last terms are yk and yk + 1, we change that to [... yk, yk+1=∞] and [... yk, zk+1=1]. Now increase k by 1.
Inner loops: This is essentially the same as @templatetypedef's interview question about the integers. We do a two-phase binary search to get closer:
Inner loop 1: yk = ∞, zk = a, and X is between Y and Z.
Double Z's last term: Compute M = Z but with mk = 2*a = 2*zk.
Query the unknown number: q = Q(X,M).
If q = 0, we have our answer and go to step 17 .
If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 5.
Otherwise set Y = M and go to the next step:
Inner loop 2. yk = b, zk = a, and X is between Y and Z.
If a and b differ by 1, swap Y and Z, go to step 2.
Perform a binary search: compute M where mk = floor((a+b)/2, and query q = Q(X,M).
If q = 0, we're done and go to step 17.
If q and Q(X,Y) have opposite signs, it means X is between Y and M, so set Z = M and go to step 11.
Otherwise, q and Q(X,Z) have opposite signs, it means X is between Z and M, so set Y = M and go to step 11.
Done: X = M.
A concrete example for X = 16/113 = 0.14159292
At each step of computing M, the range of the interval reduces. It is probably fairly easy to prove (though I won't do this) that the interval reduces by a factor of at least 1/sqrt(5) at each step, which would show that this algorithm is O(log q) steps.
Note that this can be combined with templatetypedef's original interview question and apply towards any rational number p/q, not just between 0 and 1, by first computing Q(X,0), then for either positive/negative integers, bounding between two consecutive integers, and then using the above algorithm for the fractional part.
When I have a chance next, I will post a python program that implements this algorithm.
edit: also, note that you don't have to compute the continued fraction each step (which would be O(k), there are partial approximants to continued fractions that can compute the next step from the previous step in O(1).)
edit 2: Recursive definition of partial approximants:
If Ak = [a0; a1, a2, a3, ... ak] = pk/qk, then pk = akpk-1 + pk-2, and qk = akqk-1 + qk-2. (Source: Niven & Zuckerman, 4th ed, Theorems 7.3-7.5. See also Wikipedia)
Example: [0] = 0/1 = p0/q0, [0; 7] = 1/7 = p1/q1; so [0; 7, 16] = (16*1+0)/(16*7+1) = 16/113 = p2/q2.
This means that if two continued fractions Y and Z have the same terms except the last one, and the continued fraction excluding the last term is pk-1/qk-1, then we can write Y = (ykpk-1 + pk-2) / (ykqk-1 + qk-2) and Z = (zkpk-1 + pk-2) / (zkqk-1 + qk-2). It should be possible to show from this that |Y-Z| decreases by at least a factor of 1/sqrt(5) at each smaller interval produced by this algorithm, but the algebra seems to be beyond me at the moment. :-(
Here's my Python program:
and a sample output for
ratguess(makeQ(33102,113017), True, 20)
:Since Python handles biginteger math from the start, and this program uses only integer math (except for the interval calculations), it should work for arbitrary rationals.
edit 3: Outline of proof that this is O(log q), not O(log^2 q):
First note that until the rational number is found, the # of steps nk for each new continued fraction term is exactly 2b(a_k)-1 where b(a_k) is the # of bits needed to represent a_k = ceil(log2(a_k)): it's b(a_k) steps to widen the "net" of the binary search, and b(a_k)-1 steps to narrow it). See the example above, you'll note that the # of steps is always 1, 3, 7, 15, etc.
Now we can use the recurrence relation qk = akqk-1 + qk-2 and induction to prove the desired result.
Let's state it in this way: that the value of q after the Nk = sum(nk) steps required for reaching the kth term has a minimum: q >= A*2cN for some fixed constants A,c. (so to invert, we'd get that the # of steps N is <= (1/c) * log2 (q/A) = O(log q).)
Base cases:
This implies A = 1, c = 1/2 could provide desired bounds. In reality, q may not double each term (counterexample: [0; 1, 1, 1, 1, 1] has a growth factor of phi = (1+sqrt(5))/2) so let's use c = 1/4.
Induction:
for term k, qk = akqk-1 + qk-2. Again, for the nk = 2b-1 steps needed for this term, ak >= 2b-1 = 2(nk-1)/2.
So akqk-1 >= 2(Nk-1)/2 * qk-1 >= 2(nk-1)/2 * A*2Nk-1/4 = A*2Nk/4/sqrt(2)*2nk/4.
Argh -- the tough part here is that if ak = 1, q may not increase much for that one term, and we need to use qk-2 but that may be much smaller than qk-1.
You can sort rational numbers in a given interval by for example the pair (denominator, numerator). Then to play the game you can
[0, N]
using the doubling-step approach[a, b]
shoot for the rational with smallest denominator in the interval that is the closest to the center of the intervalthis is however probably still
O(log(num/den) + den)
(not sure and it's too early in the morning here to make me think clearly ;-) )Here is yet another way to do it. If there is sufficient interest, I will try to fill out the details tonight, but I can't right now because I have family responsibilities. Here is a stub of an implementation that should explain the algorithm:
And here is the explanation. What
best_continued_fraction(x, bound)
should do is find the last continued fraction approximation tox
with the denominator at mostbound
. This algorithm will take polylog steps to complete and finds very good (though not always the best) approximations. So for eachbound
we'll get something close to a binary search through all possible fractions of that size. Occasionally we won't find a particular fraction until we increase the bound farther than we should, but we won't be far off.So there you have it. A logarithmic number of questions found with polylog work.
Update: And full working code.
It appears slightly more efficient in guesses than the previous solution, and does a lot fewer operations. For 101/1024 it required 19 guesses and 251 operations. For .98765 it needed 27 guesses and 623 operations. For 0.0123456789 it required 66 guesses and 889 operations. And for giggles and grins, for 0.0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789 (that's 10 copies of the previous one) it required 665 guesses and 23289 operations.
Okay, I think I figured out an O(lg2 q) algorithm for this problem that is based on Jason S's most excellent insight about using continued fractions. I thought I'd flesh the algorithm out all the way right here so that we have a complete solution, along with a runtime analysis.
The intuition behind the algorithm is that any rational number p/q within the range can be written as
For appropriate choices of ai. This is called a continued fraction. More importantly, though these ai can be derived by running the Euclidean algorithm on the numerator and denominator. For example, suppose we want to represent 11/14 this way. We begin by noting that 14 goes into eleven zero times, so a crude approximation of 11/14 would be
Now, suppose that we take the reciprocal of this fraction to get 14/11 = 1 3/11. So if we write
We get a slightly better approximation to 11/14. Now that we're left with 3 / 11, we can take the reciprocal again to get 11/3 = 3 2/3, so we can consider
Which is another good approximation to 11/14. Now, we have 2/3, so consider the reciprocal, which is 3/2 = 1 1/2. If we then write
We get another good approximation to 11/14. Finally, we're left with 1/2, whose reciprocal is 2/1. If we finally write out
which is exactly the fraction we wanted. Moreover, look at the sequence of coefficients we ended up using. If you run the extended Euclidean algorithm on 11 and 14, you get that
It turns out that (using more math than I currently know how to do!) that this isn't a coincidence and that the coefficients in the continued fraction of p/q are always formed by using the extended Euclidean algorithm. This is great, because it tells us two things:
Given these two facts, we can come up with an algorithm to recover any rational number p/q, not just those between 0 and 1, by applying the general algorithm for guessing arbitrary integers n one at a time to recover all of the coefficients in the continued fraction for p/q. For now, though, we'll just worry about numbers in the range (0, 1], since the logic for handling arbitrary rational numbers can be done easily given this as a subroutine.
As a first step, let's suppose that we want to find the best value of a1 so that 1 / a1 is as close as possible to p/q and a1 is an integer. To do this, we can just run our algorithm for guessing arbitrary integers, taking the reciprocal each time. After doing this, one of two things will have happened. First, we might by sheer coincidence discover that p/q = 1/k for some integer k, in which case we're done. If not, we'll find that p/q is sandwiched between 1/(a1 - 1) and 1/a0 for some a1. When we do this, then we start working on the continued fraction one level deeper by finding the a2 such that p/q is between 1/(a1 + 1/a2) and 1/(a1 + 1/(a2 + 1)). If we magically find p/q, that's great! Otherwise, we then go one level down further in the continued fraction. Eventually, we'll find the number this way, and it can't take too long. Each binary search to find a coefficient takes at most O(lg(p + q)) time, and there are at most O(lg(p + q)) levels to the search, so we need only O(lg2(p + q)) arithmetic operations and probes to recover p/q.
One detail I want to point out is that we need to keep track of whether we're on an odd level or an even level when doing the search because when we sandwich p/q between two continued fractions, we need to know whether the coefficient we were looking for was the upper or the lower fraction. I'll state without proof that for ai with i odd you want to use the upper of the two numbers, and with ai even you use the lower of the two numbers.
I am almost 100% confident that this algorithm works. I'm going to try to write up a more formal proof of this in which I fill in all of the gaps in this reasoning, and when I do I'll post a link here.
Thanks to everyone for contributing the insights necessary to get this solution working, especially Jason S for suggesting a binary search over continued fractions.
Remember that any rational number in (0, 1) can be represented as a finite sum of distinct (positive or negative) unit fractions. For example, 2/3 = 1/2 + 1/6 and 2/5 = 1/2 - 1/10. You can use this to perform a straight-forward binary search.
I think I found an O(log^2(p + q)) algorithm.
To avoid confusion in the next paragraph, a "query" refers to when the guesser gives the challenger a guess, and the challenger responds "bigger" or "smaller". This allows me to reserve the word "guess" for something else, a guess for p + q that is not asked directly to the challenger.
The idea is to first find p + q, using the algorithm you describe in your question: guess a value k, if k is too small, double it and try again. Then once you have an upper and lower bound, do a standard binary search. This takes O(log(p+q)T) queries, where T is an upper bound for the number of queries it takes to check a guess. Let's find T.
We want to check all fractions r/s with r + s <= k, and double k until k is sufficiently large. Note that there are O(k^2) fractions you need to check for a given value of k. Build a balanced binary search tree containing all these values, then search it to determine if p/q is in the tree. It takes O(log k^2) = O(log k) queries to confirm that p/q is not in the tree.
We will never guess a value of k greater than 2(p + q). Hence we can take T = O(log(p+q)).
When we guess the correct value for k (i.e., k = p + q), we will submit the query p/q to the challenger in the course of checking our guess for k, and win the game.
Total number of queries is then O(log^2(p + q)).