Expand a random range from 1–5 to 1–7

2018-12-31 12:10发布

Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.

  1. What is a simple solution?
  2. What is an effective solution to reduce memory usage or run on a slower CPU?

30条回答
孤独寂梦人
2楼-- · 2018-12-31 12:42

If we consider the additional constraint of trying to give the most efficient answer i.e one that given an input stream, I, of uniformly distributed integers of length m from 1-5 outputs a stream O, of uniformly distributed integers from 1-7 of the longest length relative to m, say L(m).

The simplest way to analyse this is to treat the streams I and O as 5-ary and 7-ary numbers respectively. This is achieved by the main answer's idea of taking the stream a1, a2, a3,... -> a1+5*a2+5^2*a3+.. and similarly for stream O.

Then if we take a section of the input stream of length m choose n s.t. 5^m-7^n=c where c>0 and is as small as possible. Then there is a uniform map from the input stream of length m to integers from 1 to 5^m and another uniform map from integers from 1 to 7^n to the output stream of length n where we may have to lose a few cases from the input stream when the mapped integer exceeds 7^n.

So this gives a value for L(m) of around m (log5/log7) which is approximately .82m.

The difficulty with the above analysis is the equation 5^m-7^n=c which is not easy to solve exactly and the case where the uniform value from 1 to 5^m exceeds 7^n and we lose efficiency.

The question is how close to the best possible value of m (log5/log7) can be attain. For example when this number approaches close to an integer can we find a way to achieve this exact integral number of output values?

If 5^m-7^n=c then from the input stream we effectively generate a uniform random number from 0 to (5^m)-1 and don't use any values higher than 7^n. However these values can be rescued and used again. They effectively generate a uniform sequence of numbers from 1 to 5^m-7^n. So we can then try to use these and convert them into 7-ary numbers so that we can create more output values.

If we let T7(X) to be the average length of the output sequence of random(1-7) integers derived from a uniform input of size X, and assuming that 5^m=7^n0+7^n1+7^n2+...+7^nr+s, s<7.

Then T7(5^m)=n0x7^n0/5^m + ((5^m-7^n0)/5^m) T7(5^m-7^n0) since we have a length no sequence with probability 7^n0/5^m with a residual of length 5^m-7^n0 with probability (5^m-7^n0)/5^m).

If we just keep substituting we obtain:

T7(5^m) = n0x7^n0/5^m + n1x7^n1/5^m + ... + nrx7^nr/5^m  = (n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/5^m

Hence

L(m)=T7(5^m)=(n0x7^n0 + n1x7^n1 + ... + nrx7^nr)/(7^n0+7^n1+7^n2+...+7^nr+s)

Another way of putting this is:

If 5^m has 7-ary representation `a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r
Then L(m) = (a1*7 + 2a2*7^2 + 3a3*7^3+...+rar*7^r)/(a0+a1*7 + a2*7^2 + a3*7^3+...+ar*7^r)

The best possible case is my original one above where 5^m=7^n+s, where s<7.

Then T7(5^m) = nx(7^n)/(7^n+s) = n+o(1) = m (Log5/Log7)+o(1) as before.

The worst case is when we can only find k and s.t 5^m = kx7+s.

Then T7(5^m) = 1x(k.7)/(k.7+s) = 1+o(1)

Other cases are somewhere inbetween. It would be interesting to see how well we can do for very large m, i.e. how good can we get the error term:

T7(5^m) = m (Log5/Log7)+e(m)

It seems impossible to achieve e(m) = o(1) in general but hopefully we can prove e(m)=o(m).

The whole thing then rests on the distribution of the 7-ary digits of 5^m for various values of m.

I'm sure there is a lot of theory out there that covers this I may have a look and report back at some point.

查看更多
千与千寻千般痛.
3楼-- · 2018-12-31 12:42

I know it has been answered, but is this seems to work ok, but I can not tell you if it has a bias. My 'testing' suggests it is, at least, reasonable.

Perhaps Adam Rosenfield would be kind enough to comment?

My (naive?) idea is this:

Accumulate rand5's until there is enough random bits to make a rand7. This takes at most 2 rand5's. To get the rand7 number I use the accumulated value mod 7.

To avoid the accumulator overflowing, and since the accumulator is mod 7 then I take the mod 7 of the accumulator:

(5a + rand5) % 7 = (k*7 + (5a%7) + rand5) % 7 = ( (5a%7) + rand5) % 7

The rand7() function follows:

(I let the range of rand5 be 0-4 and rand7 is likewise 0-6.)

int rand7(){
  static int    a=0;
  static int    e=0;
  int       r;
  a = a * 5 + rand5();
  e = e + 5;        // added 5/7ths of a rand7 number
  if ( e<7 ){
    a = a * 5 + rand5();
    e = e + 5;  // another 5/7ths
  }
  r = a % 7;
  e = e - 7;        // removed a rand7 number
  a = a % 7;
  return r;
}

Edit: Added results for 100 million trials.

'Real' rand functions mod 5 or 7

rand5 : avg=1.999802 0:20003944 1:19999889 2:20003690 3:19996938 4:19995539 rand7 : avg=3.000111 0:14282851 1:14282879 2:14284554 3:14288546 4:14292388 5:14288736 6:14280046

My rand7

Average looks ok and number distributions look ok too.

randt : avg=3.000080 0:14288793 1:14280135 2:14287848 3:14285277 4:14286341 5:14278663 6:14292943

查看更多
柔情千种
4楼-- · 2018-12-31 12:45

By using a rolling total, you can both

  • maintain an equal distribution; and
  • not have to sacrifice any element in the random sequence.

Both these problems are an issue with the simplistic rand(5)+rand(5)...-type solutions. The following Python code shows how to implement it (most of this is proving the distribution).

import random
x = []
for i in range (0,7):
    x.append (0)
t = 0
tt = 0
for i in range (0,700000):
    ########################################
    #####            qq.py             #####
    r = int (random.random () * 5)
    t = (t + r) % 7
    ########################################
    #####       qq_notsogood.py        #####
    #r = 20
    #while r > 6:
        #r =     int (random.random () * 5)
        #r = r + int (random.random () * 5)
    #t = r
    ########################################
    x[t] = x[t] + 1
    tt = tt + 1
high = x[0]
low = x[0]
for i in range (0,7):
    print "%d: %7d %.5f" % (i, x[i], 100.0 * x[i] / tt)
    if x[i] < low:
        low = x[i]
    if x[i] > high:
        high = x[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / tt)

And this output shows the results:

pax$ python qq.py
0:   99908 14.27257
1:  100029 14.28986
2:  100327 14.33243
3:  100395 14.34214
4:   99104 14.15771
5:   99829 14.26129
6:  100408 14.34400
Variation = 1304 (0.18629%)

pax$ python qq.py
0:   99547 14.22100
1:  100229 14.31843
2:  100078 14.29686
3:   99451 14.20729
4:  100284 14.32629
5:  100038 14.29114
6:  100373 14.33900
Variation = 922 (0.13171%)

pax$ python qq.py
0:  100481 14.35443
1:   99188 14.16971
2:  100284 14.32629
3:  100222 14.31743
4:   99960 14.28000
5:   99426 14.20371
6:  100439 14.34843
Variation = 1293 (0.18471%)

A simplistic rand(5)+rand(5), ignoring those cases where this returns more than 6 has a typical variation of 18%, 100 times that of the method shown above:

pax$ python qq_notsogood.py
0:   31756 4.53657
1:   63304 9.04343
2:   95507 13.64386
3:  127825 18.26071
4:  158851 22.69300
5:  127567 18.22386
6:   95190 13.59857
Variation = 127095 (18.15643%)

pax$ python qq_notsogood.py
0:   31792 4.54171
1:   63637 9.09100
2:   95641 13.66300
3:  127627 18.23243
4:  158751 22.67871
5:  126782 18.11171
6:   95770 13.68143
Variation = 126959 (18.13700%)

pax$ python qq_notsogood.py
0:   31955 4.56500
1:   63485 9.06929
2:   94849 13.54986
3:  127737 18.24814
4:  159687 22.81243
5:  127391 18.19871
6:   94896 13.55657
Variation = 127732 (18.24743%)

And, on the advice of Nixuz, I've cleaned the script up so you can just extract and use the rand7... stuff:

import random

# rand5() returns 0 through 4 inclusive.

def rand5():
    return int (random.random () * 5)

# rand7() generator returns 0 through 6 inclusive (using rand5()).

def rand7():
    rand7ret = 0
    while True:
        rand7ret = (rand7ret + rand5()) % 7
        yield rand7ret

# Number of test runs.

count = 700000

# Work out distribution.

distrib = [0,0,0,0,0,0,0]
rgen =rand7()
for i in range (0,count):
    r = rgen.next()
    distrib[r] = distrib[r] + 1

# Print distributions and calculate variation.

high = distrib[0]
low = distrib[0]
for i in range (0,7):
    print "%d: %7d %.5f" % (i, distrib[i], 100.0 * distrib[i] / count)
    if distrib[i] < low:
        low = distrib[i]
    if distrib[i] > high:
        high = distrib[i]
diff = high - low
print "Variation = %d (%.5f%%)" % (diff, 100.0 * diff / count)
查看更多
旧人旧事旧时光
5楼-- · 2018-12-31 12:48
int ans = 0;
while (ans == 0) 
{
     for (int i=0; i<3; i++) 
     {
          while ((r = rand5()) == 3){};
          ans += (r < 3) >> i
     }
}
查看更多
人间绝色
6楼-- · 2018-12-31 12:49

As long as there aren't seven possibilities left to choose from, draw another random number, which multiplies the number of possibilities by five. In Perl:

$num = 0;
$possibilities = 1;

sub rand7
{
  while( $possibilities < 7 )
  {
    $num = $num * 5 + int(rand(5));
    $possibilities *= 5;
  }
  my $result = $num % 7;
  $num = int( $num / 7 );
  $possibilities /= 7;
  return $result;
}
查看更多
笑指拈花
7楼-- · 2018-12-31 12:49

There you go, uniform distribution and zero rand5 calls.

def rand7:
    seed += 1
    if seed >= 7:
        seed = 0
    yield seed

Need to set seed beforehand.

查看更多
登录 后发表回答