Random number in the range 1 to sys.maxsize is alw

2019-02-07 16:22发布

I am trying to find the statistical properties of the PRNGs available in Python (2.7.10) by using the frequency test, runs test and the chi squared test.

For carrying out the frequency test, I need to convert the generated random number to its binary representation and then count the distribution of 1's and 0's. I was experimenting with the binary representation of the random numbers on the python console and observed this weird behavior:

>>> for n in random.sample(xrange(1, sys.maxsize), 50):
...     print '{0:b}'.format(n)
... 
101101110011011001110011110110101101101101111111101000000000001
110000101001001011101001110111111110011000101011100010000000001
110111101101110011100010001010000101011111110010001110000000001
100001111010011000101001000001000011001111100000001010000000001
1111000010010011111100111110110100100011110111010000000000001
111000001011101011101110100001001001000011011001110110000000001
1000100111011000111000101010000101010100110111000100000000001
11101001000001101111110101111011001000100011011011010000000001
110011010111101101011000110011011001110001111000001010000000001
110110110110111100011111110111011111101000011001100000000001
100010010000011101011100110101011110111100001100100000000000001
10111100011010011010001000101011001110010010000010010000000001
101011100110110001010110000101100000111111011101011000000000001
1111110010110010000111111000010001101011011010101110000000001
11100010101101110110101000101101011011111101101000010000000001
10011110110110010110011010000110010010111001111001010000000001
110110011100111010100111100100000100011101100001100000000000001
100110011001101011110011010101111101100010000111001010000000001
111000101101100111110010110110100110111001000101000000000000001
111111101000010111001011111100111100011101001011010000000001
11110001111100000111010010011111010101101110111001010000000001
100001100101101100010101111100111101111001101010101010000000001
11101010110011000001101110000000001111010001110111000000000001
100111000110111010001110110101001011100101111101010000000001
100001101100000011101101010101111111011010111110111110000000001
100010010011110110111111111000010001101100111001001100000000001
110011111110010011000110101010101001001010000100011010000000001
1111011010100001001101101000011100001011001110010100000000001
110110011101100101001100111010101111001011111101100000000000001
1010001110100101001001011111000111011100001100000110000000001
1000101110010011011000001011010110001000110100100100000000001
11111110011001011100111110110111000001000100100010000000000001
101111101010000101010111111111000001100101111001011110000000001
10010010111111111100000001010010101100111001100000000000001
111110000001110010001110111101110101010110001110000000000000001
100000101101000110101010010000101101000011111010001110000000001
101001011101100011001000011010010000000111110111100010000000001
10110101010000111010110111001111011000001111001100110000000001
10110111100100100011100101001100000000101110100100010000000001
10010111110001011101001110000111011010110100110111110000000001
111011110010110111011011101011001100001000111001010100000000001
101001010001010100010010010001100111101110101111000110000000001
101011111010000101010101000110001101001001011110000000000001
1010001010111101101010111110110110000001111101101110000000001
10111111111010001000110000101101010101011010101100000000001
101011101010110000001111010100100110000011111100100100000000001
111100001101111010100111010001010010000010110110010110000000001
100111111000100110100001110101000010111111010010010000000000001
100111100001011100011000000000101100111111000111100110000000001
110110100000110111011101110101101000101110111111010110000000001
>>> 

As you can see, all numbers end in 0000000001, i.e all numbers are 1 mod 2^10. Why is this so ?

Also, this behavior is observed when the range is 1 to sys.maxsize. If the range is specified to be 1 to 2^40, this is not observed. I want to know the reason for this behavior and whether there is anything wrong in my code.

The documentation for the random library that implements the PRNGs that I am using is here.

Let me know if I should provide any more information.

3条回答
霸刀☆藐视天下
2楼-- · 2019-02-07 16:40

@roeland hinted at the cause: in Python 2, sample() uses int(random.random() * n) repeatedly. Look at the source code (in your Python's Lib/random.py) for full details. In short, random.random() returns no more than 53 significant (non-zero) leading bits; then int() fills the rest of the low-order bits with zeroes (you're obviously on a machine where sys.maxsize == 2**63 - 1); then indexing your base (xrange(1, sys.maxsize)) by an even integer with "a lot" of of low-order 0 bits always returns an odd integer with the same number of low-order 0 bits (except for the last).

In Python 3 none of that happens - random in Python 3 uses stronger algorithms, and only falls back to random.random() when necessary. For example, here under Python 3.4.3:

>>> hex(random.randrange(10**70))
'0x91fc11ed768be3a454bd66f593c218d8bbfa3b99f6285291e1d9f964a9'
>>> hex(random.randrange(10**70))
'0x7b07ff02b6676801e33094fca2fcca7f6e235481c479c521643b1acaf4'

EDIT

Here's a more directly relevant example, under 3.4.3 on a 64-bit box:

>>> import random, sys
>>> sys.maxsize == 2**63 - 1
True
>>> for i in random.sample(range(1, sys.maxsize), 6):
...    print(bin(i))
0b10001100101001001111110110011111000100110100111001100000010110
0b100111100110110100111101001100001100110001110010000101101000101
0b1100000001110000110100111101101010110001100110101111011100111
0b111110100001111100101001001001101101100100011001001010100001110
0b1100110100000011100010000011010010100100110111001111100110100
0b10011010000110101010101110001000101110111100100001111101110111

Python 3 doesn't invoke random.random() at all in this case, but instead iteratively grabs chunks of 32 bits from the underlying Mersenne Twister (32-bit unsigned ints are "the natural" outputs from this implementation of MT) , pasting them together to build a suitable index. So, in Python 3, platform floats have nothing to do with it; in Python 2, quirks of float behavior have everything to do with it.

查看更多
小情绪 Triste *
3楼-- · 2019-02-07 16:50

That depends on a lot of things, like how exactly the RNG is implemented, how much bits of state it uses, and how exactly the sample function is implemented.

Here's what the documentation says:

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1.

So if the sample indeed uses random() under the hood, then you should only expect 53 bits of meaningful bits in your result.

查看更多
我命由我不由天
4楼-- · 2019-02-07 16:55

It certainly looks like rounding error in random.sample.

The bottom 4 or so bits are always zero after the multiplication by the spread of the range (maxsize -1) then when the start of the range (1) is added they are always 1

if the multiplication was working correctly, given that the spread is not a power of two, and given that the random number only has 53 varying bits I'd expect to see varying values in the rightmost bits too.

查看更多
登录 后发表回答