I was reading about python's random module in standard library. It amazes me that when I set the seed and produce a few random numbers:
random.seed(1)
for i in range(5):
print random.random()
The numbers produced are exactly the same as the sample in the article. I think it's safe to say the algorithm is deterministic when the seed is set.
And when the seed is not set, the standard library seeds with time.time()
.
Now suppose an online service use random.random()
to generate a captcha code, can a hacker use the same random generator to reproduce the captcha easily?
- Let's assume the hacker knows about the algorithm to convert random number to captcha code. Otherwise, it seems quite impossible.
- Since random.seed() is called when the module is imported, I assume for a web application, the time used as the seed is around the time the request is sent (within a few seconds), it won't be hard to caliberate with a few tries?
Am I worrying too much, or is this a real vulnerability?
See this answer for secure random.
It shouldn't surprise you that the sequence is deterministic after seeding. That's the whole point of seeding.
random.random
is known as a PRNG, a pseudo- random number generator. This is not unique to Python, every language's simple random source is deterministic in this way.And yes, people who are genuinely concerned about security will worry that an attacker could reproduce the sequence. That's why other sources of randomness are available, like
os.urandom
, but they are more expensive.But the problem is not as bad as you say: for a web request, typically a process handles more than one request, so the module is initialized at some unknown point in the past, not when the web request was received.
The existing answers are great, but I'll just add a few points.
Update:
Actually, if you don't supply a seed, the random number generator is seeded with random bits from the system random source, it only falls back to using the system time as a seed if the OS doesn't have a random source. Also note that recent versions of Python can use an improved seeding scheme. From the docs:
Generating a CAPTCHA code is not a high-security application compared to say, generating secret cryptographic keys, especially keys that are intended to be used multiple times. As a corollary, the amount of entropy required to generate a CAPTCHA code is smaller than what's required for a cryptographic key.
Bear in mind that the system time used to seed
random
is (probably) not the system time in seconds - it's more likely to be the time in microseconds, or even nanoseconds, so it's not easy for an attacker to figure the seed out from a brute-search, apart from the considerations mentioned by Ned.Here's a quick demo, running on Python 2.6.6 on a 2GHz Linux system.
Typical output
As you can see, less than 3 milliseconds elapse between the start of the outer loop & its end, but all of the lists in
a
are quite different.Note that the seed passed to
random.seed()
can be any hashable object, and when you pass it a non-integer (eg afloat
like the system time), it first gets hashed to create an integer.Still, there's no need to merely use the system time as the seed: you can use
SystemRandom
/os.urandom()
to get the seed. That way, the seed is more unpredictable, but you get the speed of Mersenne Twister;SystemRandom
is a little slower than Mersenne Twister because it has to make system calls. However, evenurandom
isn't totally safe.From the GNU urandom man page:
The Python documentation has this to say:
So, using it for CAPTCHA is not likely to be a good idea.