I want to generate uniform integers that satisfy 0 <= result <= maxValue
.
I already have a generator that returns uniform values in the full range of the built in unsigned integer types. Let's call the methods for this byte Byte()
, ushort UInt16()
, uint UInt32()
and ulong UInt64()
. Assume that the result of these methods is perfectly uniform.
The signature of the methods I want are uint UniformUInt(uint maxValue)
and ulong UniformUInt(ulong maxValue)
.
What I'm looking for:
- Correctness
I'd prefer the return values to be distributed in the given interval.
But a very small bias is acceptable if it increases performance significantly. By that I mean a bias of an order that allows distinguisher with probability 2/3 given 2^64 values.
It must work correctly for anymaxValue
. - Performance
The method should be fast. - Efficiency
The method does consume little raw randomness, since depending on the underlying generator, generating the raw bytes might be costly. Wasting a few bits is fine, but consuming say 128 bits to generate a single number is probably excessive.
It's also possible to cache some left over randomness from the previous call in some member variables.
Be careful with int overflows, and wrapping behavior.
I already have a solution(I'll post it as an answer), but it's a bit ugly for my tastes. So I'd like to get ideas for better solutions.
Suggestions on how to unit test with large maxValue
s would be nice too, since I can't generate a histogram with 2^64 buckets and 2^74 random values. Another complication is that with certain bugs, only some maxValue
distributions are biased a lot, and others only very slightly.
My current solution. A bit ugly for my tastes. It also has two divisions per generated number, which might negatively impact performance (I haven't profiled this part yet).
How about something like this as a general-purpose solution? The algorithm is based on that used by Java's
nextInt
method, rejecting any values that would cause a non-uniform distribution. So long as the output of yourUInt32
method is perfectly uniform then this should be too.The rejection process could, theoretically, keep looping forever; in practice the performance should be pretty good. It's difficult to suggest any generally applicable optimisations without knowing (a) the expected usage patterns, and (b) the performance characteristics of your underlying RNG.
For example, if most callers will be specifying a max value <= 255 then it might not make sense to ask for four bytes of randomness every time. On the other hand, the performance benefit of requesting fewer bytes might be outweighed by the additional cost of always checking how many you actually need. (And, of course, once you do have specific information then you can keep optimising and testing until your results are good enough.)
I am not sure, that his is an answer. It definitly needs more space than a comment, so I have to write it here, but I am willing to delete if others think this is stupid.
From the OQ I get, that
My idea is to use binary digits to half, quater ... the maxValue space, until it is reduced to a number. Somthing like
I'l use maxValue=333 (decimal) as an example and assume a function
getBit()
, that randomly returns 0 or 1getBit()
can either come from your entropy source, if it is bit-based, or by consuming n bits of entropy at once on first call (obviously with n being the optimum for your entropy source), and shifting this until empty.