Generating random integer from a range

2018-12-31 07:37发布

I need a function which would generate a random integer in given range (including border values). I don't unreasonable quality/randomness requirements, I have four requirements:

  • I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
  • I need it to be reasonably uniform (use of rand() is perfectly fine).
  • the min-max ranges can be anything from <0, 1> to <-32727, 32727>.
  • it has to be seedable.

I currently have following C++ code:

output = min + (rand() * (int)(max - min) / RAND_MAX)

The problem is, that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is major issue for small ranges like <-1, 1>, where the last value is almost never returned.

So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):

enter image description here

But it still doesn't give me uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.

Could you please suggest better formula? (or even whole pseudo-random number generator function)

标签: c++ random
12条回答
路过你的时光
2楼-- · 2018-12-31 07:50

assume min and max are int values, [ and ] means include this value, ( and ) means not include this value, using above to get the right value using c++ rand()

reference: for ()[] define, visit:

https://en.wikipedia.org/wiki/Interval_(mathematics)

for rand and srand function or RAND_MAX define, visit:

http://en.cppreference.com/w/cpp/numeric/random/rand

[min, max]

int randNum = rand() % (max - min + 1) + min

(min, max]

int randNum = rand() % (max - min) + min + 1

[min, max)

int randNum = rand() % (max - min) + min

(min, max)

int randNum = rand() % (max - min - 1) + min + 1
查看更多
零度萤火
3楼-- · 2018-12-31 07:51

Here is an unbiased version that generates numbers in [low, high]:

int r;
do {
  r = rand();
} while (r < ((unsigned int)(RAND_MAX) + 1) % (high + 1 - low));
return r % (high + 1 - low) + low;

If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the do loop.

查看更多
十年一品温如言
4楼-- · 2018-12-31 07:51

In this thread rejection sampling was already discussed, but I wanted to suggest one optimization based on the fact that rand() % 2^something does not introduce any bias as already mentioned above.

The algorithm is really simple:

  • calculate the smallest power of 2 greater than the interval length
  • randomize one number in that "new" interval
  • return that number if it is less than the length of the original interval
    • reject otherwise

Here's my sample code:

int randInInterval(int min, int max) {
    int intervalLen = max - min + 1;
    //now calculate the smallest power of 2 that is >= than `intervalLen`
    int ceilingPowerOf2 = pow(2, ceil(log2(intervalLen)));

    int randomNumber = rand() % ceilingPowerOf2; //this is "as uniform as rand()"

    if (randomNumber < intervalLen)
        return min + randomNumber;      //ok!
    return randInInterval(min, max);    //reject sample and try again
} 

This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.

PS
Obviously avoiding the recursion would be more efficient (no need to calculate over and over the log ceiling..) but I thought it was more readable for this example.

查看更多
浮光初槿花落
5楼-- · 2018-12-31 07:52

The formula for this is very simple, so try this expression,

 int num = (int) rand() % (max - min) + min;  
 //Where rand() returns a random number between 0.0 and 1.0
查看更多
倾城一夜雪
6楼-- · 2018-12-31 07:54

A fast, somewhat better than yours, but still not properly uniform distributed solution is

output = min + (rand() % static_cast<int>(max - min + 1))

Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of rand(). For a comprehensive test of the quality of this method, please read this.

查看更多
柔情千种
7楼-- · 2018-12-31 07:58

Let's split the problem into two parts:

  • Generate a random number n in the range 0 through (max-min).
  • Add min to that number

The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias to the first (RAND_MAX + 1) % (max-min+1) numbers. So if we could magically change RAND_MAX to RAND_MAX - (RAND_MAX + 1) % (max-min+1), there would no longer be any bias.

It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.

The running time is now geometrically distributed, with expected value 1/p where p is the probability of getting a small enough number on the first try. Since RAND_MAX - (RAND_MAX + 1) % (max-min+1) is always less than (RAND_MAX + 1) / 2, we know that p > 1/2, so the expected number of iterations will always be less than two for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.

EDIT:

Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.

查看更多
登录 后发表回答