可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I was reading about python's random module in standard library. It amazes me that when I set the seed and produce a few random numbers:

random.seed(1)
for i in range(5):
    print random.random()

The numbers produced are exactly the same as the sample in the article. I think it's safe to say the algorithm is deterministic when the seed is set.

And when the seed is not set, the standard library seeds with time.time(). Now suppose an online service use random.random() to generate a captcha code, can a hacker use the same random generator to reproduce the captcha easily?

Let's assume the hacker knows about the algorithm to convert random number to captcha code. Otherwise, it seems quite impossible.
Since random.seed() is called when the module is imported, I assume for a web application, the time used as the seed is around the time the request is sent (within a few seconds), it won't be hard to caliberate with a few tries?

Am I worrying too much, or is this a real vulnerability?

回答1:

It shouldn't surprise you that the sequence is deterministic after seeding. That's the whole point of seeding. random.random is known as a PRNG, a pseudo- random number generator. This is not unique to Python, every language's simple random source is deterministic in this way.

And yes, people who are genuinely concerned about security will worry that an attacker could reproduce the sequence. That's why other sources of randomness are available, like os.urandom, but they are more expensive.

But the problem is not as bad as you say: for a web request, typically a process handles more than one request, so the module is initialized at some unknown point in the past, not when the web request was received.

回答2:

The existing answers are great, but I'll just add a few points.

Update:

Actually, if you don't supply a seed, the random number generator is seeded with random bits from the system random source, it only falls back to using the system time as a seed if the OS doesn't have a random source. Also note that recent versions of Python can use an improved seeding scheme. From the docs:

random.seed(a=None, version=2)

Initialize the random number generator.

If a is omitted or None, the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the os.urandom() function for details on availability).

If a is an int, it is used directly.

With version 2 (the default), a str, bytes, or bytearray object gets converted to an int and all of its bits are used.

With version 1 (provided for reproducing random sequences from older versions of Python), the algorithm for str and bytes generates a narrower range of seeds.

Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.

Generating a CAPTCHA code is not a high-security application compared to say, generating secret cryptographic keys, especially keys that are intended to be used multiple times. As a corollary, the amount of entropy required to generate a CAPTCHA code is smaller than what's required for a cryptographic key.

Bear in mind that the system time used to seed random is (probably) not the system time in seconds - it's more likely to be the time in microseconds, or even nanoseconds, so it's not easy for an attacker to figure the seed out from a brute-search, apart from the considerations mentioned by Ned.

Here's a quick demo, running on Python 2.6.6 on a 2GHz Linux system.

#!/usr/bin/env python
''' random seeding demo'''

from __future__ import print_function
import time
from random import seed, randint, random

def rf():
    return randint(10, 99)

def put_time():
    print('%.15f' % time.time())

r = range(10)
a = []

put_time()
for i in r:
    seed()
    a.append([rf() for j in r])
put_time()

for row in a:
    print(row)

Typical output

1436617059.071794986724854
1436617059.074091911315918
[95, 25, 50, 75, 80, 38, 21, 26, 85, 82]
[75, 96, 14, 13, 76, 53, 94, 68, 80, 66]
[79, 33, 65, 86, 12, 32, 80, 83, 36, 42]
[28, 47, 62, 21, 52, 30, 54, 62, 22, 28]
[22, 40, 71, 36, 78, 64, 17, 33, 99, 43]
[81, 15, 32, 15, 63, 57, 83, 67, 12, 62]
[22, 56, 54, 55, 51, 56, 34, 56, 94, 16]
[64, 82, 37, 80, 70, 91, 56, 41, 55, 12]
[47, 37, 64, 14, 69, 65, 42, 17, 22, 17]
[43, 43, 73, 82, 61, 55, 32, 52, 86, 74]

As you can see, less than 3 milliseconds elapse between the start of the outer loop & its end, but all of the lists in a are quite different.

Note that the seed passed to random.seed() can be any hashable object, and when you pass it a non-integer (eg a float like the system time), it first gets hashed to create an integer.

Still, there's no need to merely use the system time as the seed: you can use SystemRandom / os.urandom() to get the seed. That way, the seed is more unpredictable, but you get the speed of Mersenne Twister; SystemRandom is a little slower than Mersenne Twister because it has to make system calls. However, even urandom isn't totally safe.

From the GNU urandom man page:

The random number generator gathers environmental noise from device drivers and other sources into an entropy pool. The generator also keeps an estimate of the number of bits of noise in the entropy pool. From this entropy pool random numbers are created.

When read, the /dev/random device will only return random bytes within the estimated number of bits of noise in the entropy pool. /dev/random should be suitable for uses that need very high quality randomness such as one-time pad or key generation. When the entropy pool is empty, reads from /dev/random will block until additional environmental noise is gathered.

A read from the /dev/urandom device will not block waiting for more entropy. As a result, if there is not sufficient entropy in the entropy pool, the returned values are theoretically vulnerable to a cryptographic attack on the algorithms used by the driver. Knowledge of how to do this is not available in the current unclassified literature, but it is theoretically possible that such an attack may exist. If this is a concern in your application, use /dev/random instead.

Usage

If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys.

回答3:

Almost all module functions depend on the basic function random(), which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes.

See this answer for secure random.

回答4:

The Python documentation has this to say:

Warning The pseudo-random generators of this module should not be used for security purposes. Use os.urandom() or SystemRandom if you require a cryptographically secure pseudo-random number generator.

So, using it for CAPTCHA is not likely to be a good idea.