Expand a random range from 1–5 to 1–7

2018-12-31 12:10发布

Given a function which produces a random integer in the range 1 to 5, write a function which produces a random integer in the range 1 to 7.

  1. What is a simple solution?
  2. What is an effective solution to reduce memory usage or run on a slower CPU?

30条回答
谁念西风独自凉
2楼-- · 2018-12-31 12:50

(I have stolen Adam Rosenfeld's answer and made it run about 7% faster.)

Assume that rand5() returns one of {0,1,2,3,4} with equal distribution and the goal is return {0,1,2,3,4,5,6} with equal distribution.

int rand7() {
  i = 5 * rand5() + rand5();
  max = 25;
  //i is uniform among {0 ... max-1}
  while(i < max%7) {
    //i is uniform among {0 ... (max%7 - 1)}
    i *= 5;
    i += rand5(); //i is uniform {0 ... (((max%7)*5) - 1)}
    max %= 7;
    max *= 5; //once again, i is uniform among {0 ... max-1}
  }
  return(i%7);
}

We're keeping track of the largest value that the loop can make in the variable max. If the reult so far is between max%7 and max-1 then the result will be uniformly distrubuted in that range. If not, we use the remainder, which is random between 0 and max%7-1, and another call to rand() to make a new number and a new max. Then we start again.

Edit: Expect number of times to call rand5() is x in this equation:

x =  2     * 21/25
   + 3     *  4/25 * 14/20
   + 4     *  4/25 *  6/20 * 28/30
   + 5     *  4/25 *  6/20 *  2/30 * 7/10
   + 6     *  4/25 *  6/20 *  2/30 * 3/10 * 14/15
   + (6+x) *  4/25 *  6/20 *  2/30 * 3/10 *  1/15
x = about 2.21 calls to rand5()
查看更多
君临天下
3楼-- · 2018-12-31 12:50

Algorithm:

7 can be represented in a sequence of 3 bits

Use rand(5) to randomly fill each bit with 0 or 1.
For e.g: call rand(5) and

if the result is 1 or 2, fill the bit with 0
if the result is 4 or 5, fill the bit with 1
if the result is 3 , then ignore and do it again (rejection)

This way we can fill 3 bits randomly with 0/1 and thus get a number from 1-7.

EDIT: This seems like the simplest and most efficient answer, so here's some code for it:

public static int random_7() {
    int returnValue = 0;
    while (returnValue == 0) {
        for (int i = 1; i <= 3; i++) {
            returnValue = (returnValue << 1) + random_5_output_2();
        }
    }
    return returnValue;
}

private static int random_5_output_2() {
    while (true) {
        int flip = random_5();

        if (flip < 3) {
            return 0;
        }
        else if (flip > 3) {
            return 1;
        }
    }
}
查看更多
美炸的是我
4楼-- · 2018-12-31 12:52

The premise behind Adam Rosenfield's correct answer is:

  • x = 5^n (in his case: n=2)
  • manipulate n rand5 calls to get a number y within range [1, x]
  • z = ((int)(x / 7)) * 7
  • if y > z, try again. else return y % 7 + 1

When n equals 2, you have 4 throw-away possibilities: y = {22, 23, 24, 25}. If you use n equals 6, you only have 1 throw-away: y = {15625}.

5^6 = 15625
7 * 2232 = 15624

You call rand5 more times. However, you have a much lower chance of getting a throw-away value (or an infinite loop). If there is a way to get no possible throw-away value for y, I haven't found it yet.

查看更多
还给你的自由
5楼-- · 2018-12-31 12:53
int randbit( void )
{
    while( 1 )
    {
        int r = rand5();
        if( r <= 4 ) return(r & 1);
    }
}

int randint( int nbits )
{
    int result = 0;
    while( nbits-- )
    {
        result = (result<<1) | randbit();
    }
    return( result );
}

int rand7( void )
{
    while( 1 )
    {
        int r = randint( 3 ) + 1;
        if( r <= 7 ) return( r );
    }
}
查看更多
大哥的爱人
6楼-- · 2018-12-31 12:54

I'd like to add another answer, in addition to my first answer. This answer attempts to minimize the number of calls to rand5() per call to rand7(), to maximize the usage of randomness. That is, if you consider randomness to be a precious resource, we want to use as much of it as possible, without throwing away any random bits. This answer also has some similarities with the logic presented in Ivan's answer.

The entropy of a random variable is a well-defined quantity. For a random variable which takes on N states with equal probabilities (a uniform distribution), the entropy is log2 N. Thus, rand5() has approximately 2.32193 bits of entropy, and rand7() has about 2.80735 bits of entropy. If we hope to maximize our use of randomness, we need to use all 2.32193 bits of entropy from each call to rand5(), and apply them to generating 2.80735 bits of entropy needed for each call to rand7(). The fundamental limit, then, is that we can do no better than log(7)/log(5) = 1.20906 calls to rand5() per call to rand7().

Side notes: all logarithms in this answer will be base 2 unless specified otherwise. rand5() will be assumed to return numbers in the range [0, 4], and rand7() will be assumed to return numbers in the range [0, 6]. Adjusting the ranges to [1, 5] and [1, 7] respectively is trivial.

So how do we do it? We generate an infinitely precise random real number between 0 and 1 (pretend for the moment that we could actually compute and store such an infinitely precise number -- we'll fix this later). We can generate such a number by generating its digits in base 5: we pick the random number 0.a1a2a3..., where each digit ai is chosen by a call to rand5(). For example, if our RNG chose ai = 1 for all i, then ignoring the fact that that isn't very random, that would correspond to the real number 1/5 + 1/52 + 1/53 + ... = 1/4 (sum of a geometric series).

Ok, so we've picked a random real number between 0 and 1. I now claim that such a random number is uniformly distributed. Intuitively, this is easy to understand, since each digit was picked uniformly, and the number is infinitely precise. However, a formal proof of this is somewhat more involved, since now we're dealing with a continuous distribution instead of a discrete distribution, so we need to prove that the probability that our number lies in an interval [a, b] equals the length of that interval, b - a. The proof is left as an exercise for the reader =).

Now that we have a random real number selected uniformly from the range [0, 1], we need to convert it to a series of uniformly random numbers in the range [0, 6] to generate the output of rand7(). How do we do this? Just the reverse of what we just did -- we convert it to an infinitely precise decimal in base 7, and then each base 7 digit will correspond to one output of rand7().

Taking the example from earlier, if our rand5() produces an infinite stream of 1's, then our random real number will be 1/4. Converting 1/4 to base 7, we get the infinite decimal 0.15151515..., so we will produce as output 1, 5, 1, 5, 1, 5, etc.

Ok, so we have the main idea, but we have two problems left: we can't actually compute or store an infinitely precise real number, so how do we deal with only a finite portion of it? Secondly, how do we actually convert it to base 7?

One way we can convert a number between 0 and 1 to base 7 is as follows:

  1. Multiply by 7
  2. The integral part of the result is the next base 7 digit
  3. Subtract off the integral part, leaving only the fractional part
  4. Goto step 1

To deal with the problem of infinite precision, we compute a partial result, and we also store an upper bound on what the result could be. That is, suppose we've called rand5() twice and it returned 1 both times. The number we've generated so far is 0.11 (base 5). Whatever the rest of the infinite series of calls to rand5() produce, the random real number we're generating will never be larger than 0.12: it is always true that 0.11 ≤ 0.11xyz... < 0.12.

So, keeping track of the current number so far, and the maximum value it could ever take, we convert both numbers to base 7. If they agree on the first k digits, then we can safely output the next k digits -- regardless of what the infinite stream of base 5 digits are, they will never affect the next k digits of the base 7 representation!

And that's the algorithm -- to generate the next output of rand7(), we generate only as many digits of rand5() as we need to ensure that we know with certainty the value of the next digit in the conversion of the random real number to base 7. Here is a Python implementation, with a test harness:

import random

rand5_calls = 0
def rand5():
    global rand5_calls
    rand5_calls += 1
    return random.randint(0, 4)

def rand7_gen():
    state = 0
    pow5 = 1
    pow7 = 7
    while True:
        if state / pow5 == (state + pow7) / pow5:
            result = state / pow5
            state = (state - result * pow5) * 7
            pow7 *= 7
            yield result
        else:
            state = 5 * state + pow7 * rand5()
            pow5 *= 5

if __name__ == '__main__':
    r7 = rand7_gen()
    N = 10000
    x = list(next(r7) for i in range(N))
    distr = [x.count(i) for i in range(7)]
    expmean = N / 7.0
    expstddev = math.sqrt(N * (1.0/7.0) * (6.0/7.0))

    print '%d TRIALS' % N
    print 'Expected mean: %.1f' % expmean
    print 'Expected standard deviation: %.1f' % expstddev
    print
    print 'DISTRIBUTION:'
    for i in range(7):
        print '%d: %d   (%+.3f stddevs)' % (i, distr[i], (distr[i] - expmean) / expstddev)
    print
    print 'Calls to rand5: %d (average of %f per call to rand7)' % (rand5_calls, float(rand5_calls) / N)

Note that rand7_gen() returns a generator, since it has internal state involving the conversion of the number to base 7. The test harness calls next(r7) 10000 times to produce 10000 random numbers, and then it measures their distribution. Only integer math is used, so the results are exactly correct.

Also note that the numbers here get very big, very fast. Powers of 5 and 7 grow quickly. Hence, performance will start to degrade noticeably after generating lots of random numbers, due to bignum arithmetic. But remember here, my goal was to maximize the usage of random bits, not to maximize performance (although that is a secondary goal).

In one run of this, I made 12091 calls to rand5() for 10000 calls to rand7(), achieving the minimum of log(7)/log(5) calls on average to 4 significant figures, and the resulting output was uniform.

In order to port this code to a language that doesn't have arbitrarily large integers built-in, you'll have to cap the values of pow5 and pow7 to the maximum value of your native integral type -- if they get too big, then reset everything and start over. This will increase the average number of calls to rand5() per call to rand7() very slightly, but hopefully it shouldn't increase too much even for 32- or 64-bit integers.

查看更多
爱死公子算了
7楼-- · 2018-12-31 12:54

Here's a solution that fits entirely within integers and is within about 4% of optimal (i.e. uses 1.26 random numbers in {0..4} for every one in {0..6}). The code's in Scala, but the math should be reasonably clear in any language: you take advantage of the fact that 7^9 + 7^8 is very close to 5^11. So you pick an 11 digit number in base 5, and then interpret it as a 9 digit number in base 7 if it's in range (giving 9 base 7 numbers), or as an 8 digit number if it's over the 9 digit number, etc.:

abstract class RNG {
  def apply(): Int
}

class Random5 extends RNG {
  val rng = new scala.util.Random
  var count = 0
  def apply() = { count += 1 ; rng.nextInt(5) }
}

class FiveSevener(five: RNG) {
  val sevens = new Array[Int](9)
  var nsevens = 0
  val to9 = 40353607;
  val to8 = 5764801;
  val to7 = 823543;
  def loadSevens(value: Int, count: Int) {
    nsevens = 0;
    var remaining = value;
    while (nsevens < count) {
      sevens(nsevens) = remaining % 7
      remaining /= 7
      nsevens += 1
    }
  }
  def loadSevens {
    var fivepow11 = 0;
    var i=0
    while (i<11) { i+=1 ; fivepow11 = five() + fivepow11*5 }
    if (fivepow11 < to9) { loadSevens(fivepow11 , 9) ; return }
    fivepow11 -= to9
    if (fivepow11 < to8) { loadSevens(fivepow11 , 8) ; return }
    fivepow11 -= to8
    if (fivepow11 < 3*to7) loadSevens(fivepow11 % to7 , 7)
    else loadSevens
  }
  def apply() = {
    if (nsevens==0) loadSevens
    nsevens -= 1
    sevens(nsevens)
  }
}

If you paste a test into the interpreter (REPL actually), you get:

scala> val five = new Random5
five: Random5 = Random5@e9c592

scala> val seven = new FiveSevener(five)
seven: FiveSevener = FiveSevener@143c423

scala> val counts = new Array[Int](7)
counts: Array[Int] = Array(0, 0, 0, 0, 0, 0, 0)

scala> var i=0 ; while (i < 100000000) { counts( seven() ) += 1 ; i += 1 }
i: Int = 100000000

scala> counts
res0: Array[Int] = Array(14280662, 14293012, 14281286, 14284836, 14287188,
14289332, 14283684)

scala> five.count
res1: Int = 125902876

The distribution is nice and flat (within about 10k of 1/7 of 10^8 in each bin, as expected from an approximately-Gaussian distribution).

查看更多
登录 后发表回答