I need a function which would generate a random integer in given range (including border values). I don't unreasonable quality/randomness requirements, I have four requirements:
- I need it to be fast. My project needs to generate millions (or sometimes even tens of millions) of random numbers and my current generator function has proven to be a bottleneck.
- I need it to be reasonably uniform (use of rand() is perfectly fine).
- the min-max ranges can be anything from <0, 1> to <-32727, 32727>.
- it has to be seedable.
I currently have following C++ code:
output = min + (rand() * (int)(max - min) / RAND_MAX)
The problem is, that it is not really uniform - max is returned only when rand() = RAND_MAX (for Visual C++ it is 1/32727). This is major issue for small ranges like <-1, 1>, where the last value is almost never returned.
So I grabbed pen and paper and came up with following formula (which builds on the (int)(n + 0.5) integer rounding trick):
But it still doesn't give me uniform distribution. Repeated runs with 10000 samples give me ratio of 37:50:13 for values values -1, 0. 1.
Could you please suggest better formula? (or even whole pseudo-random number generator function)
assume min and max are int values, [ and ] means include this value, ( and ) means not include this value, using above to get the right value using c++ rand()
reference: for ()[] define, visit:
https://en.wikipedia.org/wiki/Interval_(mathematics)
for rand and srand function or RAND_MAX define, visit:
http://en.cppreference.com/w/cpp/numeric/random/rand
[min, max]
(min, max]
[min, max)
(min, max)
Here is an unbiased version that generates numbers in
[low, high]
:If your range is reasonably small, there is no reason to cache the right-hand side of the comparison in the
do
loop.In this thread rejection sampling was already discussed, but I wanted to suggest one optimization based on the fact that
rand() % 2^something
does not introduce any bias as already mentioned above.The algorithm is really simple:
Here's my sample code:
This works well especially for small intervals, because the power of 2 will be "nearer" to the real interval length, and so the number of misses will be smaller.
PS
Obviously avoiding the recursion would be more efficient (no need to calculate over and over the log ceiling..) but I thought it was more readable for this example.
The formula for this is very simple, so try this expression,
A fast, somewhat better than yours, but still not properly uniform distributed solution is
Except when the size of the range is a power of 2, this method produces biased non-uniform distributed numbers regardless the quality of
rand()
. For a comprehensive test of the quality of this method, please read this.Let's split the problem into two parts:
n
in the range 0 through (max-min).The first part is obviously the hardest. Let's assume that the return value of rand() is perfectly uniform. Using modulo will add bias to the first
(RAND_MAX + 1) % (max-min+1)
numbers. So if we could magically changeRAND_MAX
toRAND_MAX - (RAND_MAX + 1) % (max-min+1)
, there would no longer be any bias.It turns out that we can use this intuition if we are willing to allow pseudo-nondeterminism into the running time of our algorithm. Whenever rand() returns a number which is too large, we simply ask for another random number until we get one which is small enough.
The running time is now geometrically distributed, with expected value
1/p
wherep
is the probability of getting a small enough number on the first try. SinceRAND_MAX - (RAND_MAX + 1) % (max-min+1)
is always less than(RAND_MAX + 1) / 2
, we know thatp > 1/2
, so the expected number of iterations will always be less than two for any range. It should be possible to generate tens of millions of random numbers in less than a second on a standard CPU with this technique.EDIT:
Although the above is technically correct, DSimon's answer is probably more useful in practice. You shouldn't implement this stuff yourself. I have seen a lot of implementations of rejection sampling and it is often very difficult to see if it's correct or not.