Generating uniform random integers with a certain

2019-02-07 15:42发布

I want to generate uniform integers that satisfy 0 <= result <= maxValue.

I already have a generator that returns uniform values in the full range of the built in unsigned integer types. Let's call the methods for this byte Byte(), ushort UInt16(), uint UInt32() and ulong UInt64(). Assume that the result of these methods is perfectly uniform.

The signature of the methods I want are uint UniformUInt(uint maxValue) and ulong UniformUInt(ulong maxValue).

What I'm looking for:

  1. Correctness
    I'd prefer the return values to be distributed in the given interval.
    But a very small bias is acceptable if it increases performance significantly. By that I mean a bias of an order that allows distinguisher with probability 2/3 given 2^64 values.
    It must work correctly for any maxValue.
  2. Performance
    The method should be fast.
  3. Efficiency
    The method does consume little raw randomness, since depending on the underlying generator, generating the raw bytes might be costly. Wasting a few bits is fine, but consuming say 128 bits to generate a single number is probably excessive.

It's also possible to cache some left over randomness from the previous call in some member variables.

Be careful with int overflows, and wrapping behavior.

I already have a solution(I'll post it as an answer), but it's a bit ugly for my tastes. So I'd like to get ideas for better solutions.

Suggestions on how to unit test with large maxValues would be nice too, since I can't generate a histogram with 2^64 buckets and 2^74 random values. Another complication is that with certain bugs, only some maxValue distributions are biased a lot, and others only very slightly.

3条回答
太酷不给撩
2楼-- · 2019-02-07 16:03

My current solution. A bit ugly for my tastes. It also has two divisions per generated number, which might negatively impact performance (I haven't profiled this part yet).

uint UniformUInt(uint maxResult)
{
    uint rand;
    uint count = maxResult + 1;

    if (maxResult < 0x100)
    {
        uint usefulCount = (0x100 / count) * count;
        do
        {
            rand = Byte();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else if (maxResult < 0x10000)
    {
        uint usefulCount = (0x10000 / count) * count;
        do
        {
            rand = UInt16();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else if (maxResult != uint.MaxValue)
    {
        uint usefulCount = (uint.MaxValue / count) * count;//reduces upper bound by 1, to avoid long division
        do
        {
            rand = UInt32();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else
    {
        return UInt32();
    }
}

ulong UniformUInt(ulong maxResult)
{
    if (maxResult < 0x100000000)
        return InternalUniformUInt((uint)maxResult);
    else if (maxResult < ulong.MaxValue)
    {
        ulong rand;
        ulong count = maxResult + 1;
        ulong usefulCount = (ulong.MaxValue / count) * count;//reduces upper bound by 1, since ulong can't represent any more
        do
        {
            rand = UInt64();
        } while (rand >= usefulCount);
        return rand % count;
    }
    else
        return UInt64();
}
查看更多
贪生不怕死
3楼-- · 2019-02-07 16:28

How about something like this as a general-purpose solution? The algorithm is based on that used by Java's nextInt method, rejecting any values that would cause a non-uniform distribution. So long as the output of your UInt32 method is perfectly uniform then this should be too.

uint UniformUInt(uint inclusiveMaxValue)
{
    unchecked
    {
        uint exclusiveMaxValue = inclusiveMaxValue + 1;

        // if exclusiveMaxValue is a power of two then we can just use a mask
        // also handles the edge case where inclusiveMaxValue is uint.MaxValue
        if ((exclusiveMaxValue & (~exclusiveMaxValue + 1)) == exclusiveMaxValue)
            return UInt32() & inclusiveMaxValue;

        uint bits, val;
        do
        {
            bits = UInt32();
            val = bits % exclusiveMaxValue;

            // if (bits - val + inclusiveMaxValue) overflows then val has been
            // taken from an incomplete chunk at the end of the range of bits
            // in that case we reject it and loop again
        } while (bits - val + inclusiveMaxValue < inclusiveMaxValue);

        return val;
    }
}

The rejection process could, theoretically, keep looping forever; in practice the performance should be pretty good. It's difficult to suggest any generally applicable optimisations without knowing (a) the expected usage patterns, and (b) the performance characteristics of your underlying RNG.

For example, if most callers will be specifying a max value <= 255 then it might not make sense to ask for four bytes of randomness every time. On the other hand, the performance benefit of requesting fewer bytes might be outweighed by the additional cost of always checking how many you actually need. (And, of course, once you do have specific information then you can keep optimising and testing until your results are good enough.)

查看更多
Evening l夕情丶
4楼-- · 2019-02-07 16:28

I am not sure, that his is an answer. It definitly needs more space than a comment, so I have to write it here, but I am willing to delete if others think this is stupid.

From the OQ I get, that

  1. Entropy bits are very expensive
  2. Everything else should be considered expensive, but less so than entropy.

My idea is to use binary digits to half, quater ... the maxValue space, until it is reduced to a number. Somthing like

I'l use maxValue=333 (decimal) as an example and assume a function getBit(), that randomly returns 0 or 1

offset:=0
space:=maxValue

while (space>0)

  //Right-shift the value, keeping the rightmost bit this should be 
  //efficient on x86 and x64, if coded in real code, not pseudocode
  remains:=space & 1
  part:=floor(space/2)
  space:=part

  //In the 333 example, part is now 166, but 2*166=332 If we were to simply chose one
  //half of the space, we would be heavily biased towards the upper half, so in case
  //we have a remains, we consume a bit of entropy to decide which half is bigger

  if (remains)
    if(getBit())
      part++;

  //Now we decide which half to chose, consuming a bit of entropy
  if (getBit())
    offset+=part;

  //Exit condition: The remeinind number space=0 is guaranteed to be met
  //In the 333 example, offset will be 0, 166 or 167, remaining space will be 166
}

randomResult:=offset

getBit() can either come from your entropy source, if it is bit-based, or by consuming n bits of entropy at once on first call (obviously with n being the optimum for your entropy source), and shifting this until empty.

查看更多
登录 后发表回答