Using bitwise & instead of modulus operator to ran

2019-08-04 23:48发布

问题:

I need to randomly sample from a uniform distribution of integers over the interval [LB,UB] in C++. To do so, I start with a "good" RN generator (from Numerical Recipes 3rd ed.) that uniformly randomly samples 64-bit integers; let's call it int64().

Using the mod operator, I can sample from the integers in [LB,UB] by:

LB+int64()%(UB-LB+1);

The only issue with using the mod operator is the slowness of the integer division. So, I then tried the method suggested here, which is:

LB + (int64()&(UB-LB))

The bitwise & method is about 3 times as fast. This is huge for me, because one of my simulations in C++ needs to randomly sample about 20 million integers.

But there's 1 big problem. When I analyze the integers sampled using the bitwise & method, they don't appear uniformly distributed over the interval [LB,UB]. The integers are indeed sampled from [LB,UB], but only from the even integers in that range. For example, here is a histogram of 5000 integers sampled from [20,50] using the bitwise & method:

By comparison, here is what a similar histogram looks like when using the mod operator method, which of course works fine:

What's wrong with my bitwise & method? Is there any way to modify it so that both even and odd numbers are sampled over the defined interval?

回答1:

The bitwise & operator looks at each pair of corresponding bits of its operands, performs an and using only those two bits, and puts that result in the corresponding bit of the result.

So, if the last bit of UB-LB is 0, then the last bit of the result is 0. That is to say, if UB-LB is even then every output will be even.

The & is inappropriate to the purpose, unless UB-LB+1 is a power of 2. If you want to find a modulus, then there's no general shortcut: the compiler will already implement % the fastest way it knows.

Note that I said no general shortcut. For particular values of UB-LB, known at compile time, there can be faster ways. And if you can somehow arrange for UB and LB to have values that the compiler can compute at compile time then it will use them when you write %.

By the way, using % does not in fact produce uniformly-distributed integers over the range, unless the size of the range is a power of 2. Otherwise there must be a slight bias in favour of certain values, because the range of your int64() function cannot be assigned equally across the desired range. It may be that the bias is too small to affect your simulation in particular, but bad random number generators have broken random simulations in the past, and will do so again.

If you want a uniform random number distribution over an arbitrary range, then use std::uniform_int_distribution from C++11, or the class of the same name in Boost.



回答2:

This works well if the range difference (UB-LB) is 2n-1, but won't work at all well if for example 2n.



回答3:

The two are equivalent only when the size of the interval is a power of two. In general y%x and y&(x-1) are not the same.

For example, x%5 produces numbers from 0 to 4 (or to -4, for negative x), but x&4 produces either 0 or 4, never 1, 2, or 3, because of how bitwise operators work...