Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
- What is a simple solution?
- What is an effective solution to reduce memory usage or run on a slower CPU?
Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.
If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream,
I
, of uniformly distributed integers of lengthm
from 1-5 outputs a streamO
, of uniformly distributed integers from 1-7 of the longest length relative tom
, sayL(m)
.The simplest way to analyse this is to treat the streams I and
O
as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the streama1, a2, a3,... -> a1+5*a2+5^2*a3+..
and similarly for streamO
.Then if we take a section of the input stream of length
m choose n s.t. 5^m-7^n=c
wherec>0
and is as small as possible. Then there is a uniform map from the input stream of length m to integers from1
to5^m
and another uniform map from integers from 1 to7^n
to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds7^n
.So this gives a value for
L(m)
of aroundm (log5/log7)
which is approximately.82m
.The difficulty with the above analysis is the equation
5^m-7^n=c
which is not easy to solve exactly and the case where the uniform value from1
to5^m
exceeds7^n
and we lose efficiency.The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?
If
5^m-7^n=c
then from the input stream we effectively generate a uniform random number from0
to(5^m)-1
and don't use any values higher than7^n
. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to5^m-7^n
. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.If we let
T7(X)
to be the average length of the output sequence ofrandom(1-7)
integers derived from a uniform input of sizeX
, and assuming that5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7
.Then
T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0)
since we have a length no sequence with probability 7^n0/5^m with a residual of length5^m-7^n0
with probability(5^m-7^n0)/5^m)
.If we just keep substituting we obtain:
Hence
Another way of putting this is:
The best possible case is my original one above where
5^m=7^n+s
, wheres<7
.Then
T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1)
as before.The worst case is when we can only find k and s.t 5^m = kx7+s.
Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:
It seems impossible to achieve
e(m) = o(1)
in general but hopefully we can provee(m)=o(m)
.The whole thing then rests on the distribution of the 7-ary digits of
5^m
for various values ofm
.I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.
I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.
Perhaps Adam Rosenfield would be kind enough to comment?
My (naive?) idea is this:
Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.
To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:
The rand7() function follows:
(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)
Edit: Added results for 100 million trials.
'Real' rand functions mod 5 or 7
rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046
My rand7
Average looks ok and number distributions look ok too.
randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943
By using a rolling total, you can both
Both these problems are an issue with the simplistic
rand(5)+rand(5)...
-type solutions. The following Python code shows how to implement it (most of this is proving the distribution).And this output shows the results:
A simplistic
rand(5)+rand(5)
, ignoring those cases where this returns more than 6 has a typical variation of 18%, 100 times that of the method shown above:And, on the advice of Nixuz, I've cleaned the script up so you can just extract and use the
rand7...
stuff:As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:
There you go, uniform distribution and zero rand5 calls.
Need to set seed beforehand.