I was reading about python's random module in standard library. It amazes me that when I set the seed and produce a few random numbers:
random.seed(1)
for i in range(5):
print random.random()
The numbers produced are exactly the same as the sample in the article. I think it's safe to say the algorithm is deterministic when the seed is set.
And when the seed is not set, the standard library seeds with time.time()
.
Now suppose an online service use random.random()
to generate a captcha code, can a hacker use the same random generator to reproduce the captcha easily?
- Let's assume the hacker knows about the algorithm to convert random number to captcha code. Otherwise, it seems quite impossible.
- Since random.seed() is called when the module is imported, I assume for a web application, the time used as the seed is around the time the request is sent (within a few seconds), it won't be hard to caliberate with a few tries?
Am I worrying too much, or is this a real vulnerability?
It shouldn't surprise you that the sequence is deterministic after seeding. That's the whole point of seeding. random.random
is known as a PRNG, a pseudo- random number generator. This is not unique to Python, every language's simple random source is deterministic in this way.
And yes, people who are genuinely concerned about security will worry that an attacker could reproduce the sequence. That's why other sources of randomness are available, like os.urandom
, but they are more expensive.
But the problem is not as bad as you say: for a web request, typically a process handles more than one request, so the module is initialized at some unknown point in the past, not when the web request was received.
The existing answers are great, but I'll just add a few points.
Update:
Actually, if you don't supply a seed, the random number generator is seeded with random bits from the system random source, it only falls back to using the system time as a seed if the OS doesn't have a random source. Also note that recent versions of Python can use an improved seeding scheme. From the docs:
random.seed(a=None, version=2)
Initialize the random number generator.
If a
is omitted or None
, the current system time is used. If
randomness sources are provided by the operating system, they are used
instead of the system time (see the os.urandom()
function for
details on availability).
If a
is an int, it is used directly.
With version 2 (the default), a str, bytes, or bytearray object gets
converted to an int and all of its bits are used.
With version 1 (provided for reproducing random sequences from older
versions of Python), the algorithm for str and bytes generates a
narrower range of seeds.
Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed.
Generating a CAPTCHA code is not a high-security application compared to say, generating secret cryptographic keys, especially keys that are intended to be used multiple times. As a corollary, the amount of entropy required to generate a CAPTCHA code is smaller than what's required for a cryptographic key.
Bear in mind that the system time used to seed random
is (probably) not the system time in seconds - it's more likely to be the time in microseconds, or even nanoseconds, so it's not easy for an attacker to figure the seed out from a brute-search, apart from the considerations mentioned by Ned.
Here's a quick demo, running on Python 2.6.6 on a 2GHz Linux system.
#!/usr/bin/env python
''' random seeding demo'''
from __future__ import print_function
import time
from random import seed, randint, random
def rf():
return randint(10, 99)
def put_time():
print('%.15f' % time.time())
r = range(10)
a = []
put_time()
for i in r:
seed()
a.append([rf() for j in r])
put_time()
for row in a:
print(row)
Typical output
1436617059.071794986724854
1436617059.074091911315918
[95, 25, 50, 75, 80, 38, 21, 26, 85, 82]
[75, 96, 14, 13, 76, 53, 94, 68, 80, 66]
[79, 33, 65, 86, 12, 32, 80, 83, 36, 42]
[28, 47, 62, 21, 52, 30, 54, 62, 22, 28]
[22, 40, 71, 36, 78, 64, 17, 33, 99, 43]
[81, 15, 32, 15, 63, 57, 83, 67, 12, 62]
[22, 56, 54, 55, 51, 56, 34, 56, 94, 16]
[64, 82, 37, 80, 70, 91, 56, 41, 55, 12]
[47, 37, 64, 14, 69, 65, 42, 17, 22, 17]
[43, 43, 73, 82, 61, 55, 32, 52, 86, 74]
As you can see, less than 3 milliseconds elapse between the start of the outer loop & its end, but all of the lists in a
are quite different.
Note that the seed passed to random.seed()
can be any hashable object, and when you pass it a non-integer (eg a float
like the system time), it first gets hashed to create an integer.
Still, there's no need to merely use the system time as the seed: you can use SystemRandom
/ os.urandom()
to get the seed. That way, the seed is more unpredictable, but you get the speed of Mersenne Twister; SystemRandom
is a little slower than Mersenne Twister because it has to make system calls. However, even urandom
isn't totally safe.
From the GNU urandom man page:
The random number generator gathers environmental noise from device
drivers and other sources into an entropy pool. The generator also
keeps an estimate of the number of bits of noise in the entropy pool.
From this entropy pool random numbers are created.
When read, the /dev/random device will only return random bytes
within the estimated number of bits of noise in the entropy pool.
/dev/random should be suitable for uses that need very high quality
randomness such as one-time pad or key generation. When the entropy
pool is empty, reads from /dev/random will block until additional
environmental noise is gathered.
A read from the /dev/urandom device will not block waiting for more
entropy. As a result, if there is not sufficient entropy in the
entropy pool, the returned values are theoretically vulnerable to a
cryptographic attack on the algorithms used by the driver. Knowledge
of how to do this is not available in the current unclassified
literature, but it is theoretically possible that such an attack may
exist. If this is a concern in your application, use /dev/random
instead.
Usage
If you are unsure about whether you should use
/dev/random or /dev/urandom, then probably you want to use the latter.
As a general rule, /dev/urandom should be used for everything except
long-lived GPG/SSL/SSH keys.