Efficient way to generate and use millions of rand

2019-02-16 12:39发布

问题:

I'm in the process of working on programming project that involves some pretty extensive Monte Carlo simulation in Python, and as such the generation of a tremendous number of random numbers. Very nearly all of them, if not all of them, will be able to be generated by Python's built in random module.

I'm something of a coding newbie, and unfamiliar with efficient and inefficient ways to do things. Is it faster to generate say, all the random numbers as a list, and then iterate through that list, or generate a new random number each time a function is called, which will be in a very large loop?

Or some other, undoubtedly more clever method?

回答1:

Generate a random number each time. Since the inner workings of the loop only care about a single random number, generate and use it inside the loop.

Example:

# do this:
import random

for x in xrange(SOMEVERYLARGENUMBER):
    n = random.randint(1,1000) # whatever your range of random numbers is
    # Do stuff with n

# don't do this:
import random

# This list comprehension generates random numbers in a list
numbers = [random.randint(1,1000) for x in xrange(SOMEVERYLARGENUMBER)]

for n in numbers:
    # Do stuff with n

Obviously, in practical terms it really doesn't matter, unless you're dealing with billions and billions of iterations, but why bother generating all those numbers if you're only going to be using one at a time?



回答2:

import random
for x in (random.randint(0,80) for x in xrange(1000*1000)):
    print x

The code between parentheses will only generate one item at a time, so it's memory safe.



回答3:

Python builtin random module, e.g. random.random(), random.randint(), (some distributions also available, you probably want gaussian) does about 300K samples/s.

Since you are doing numerical computation, you probably use numpy anyway, that offers better performance if you cook random number one array at a time instead of one number at a time and wider choice of distributions. 60K/s * 1024 (array length), that's ~60M samples/s.

You can also read /dev/urandom on Linux and OSX. my hw/sw (osx laptop) manages ~10MB/s.

Surely there must be faster ways to generate random numbers en masse, e.g.:

from Crypto.Cipher import AES
from Crypto.Util import Counter
import secrets

aes = AES.new(secrets.token_bytes(16), AES.MODE_CTR, secrets.token_bytes(16), counter=Counter.new(128))
data = "0" * 2 ** 20
with open("filler.bin", "wb") as f:
    while True:
        f.write(aes.encrypt(data))

This generates 200MB/s on a single core of i5-4670K

Common ciphers like aes and blowfish manage 112MB/s and 70MB/s on my stack. Furthermore modern processors make aes even faster up to some 700MB/s see this link to test runs on few hardware combinations. (edit: link broken). You could use weaker ECB mode, provided you feed distinct inputs into it, and achieve up to 3GB/s.

Stream cipher are better suited for the task, e.g. RC4 tops out at 300MB/s on my hardware, you may get best results from most popular ciphers as more effort was spent optimising those both and software.



回答4:

Code to generate 10M random numbers efficiently and faster:

import random
l=10000000
listrandom=[]
for i in range (l):
    value=random.randint(0,l)
    listrandom.append(value)
print listrandom

Time taken included the I/O time lagged in printing on screen:

real    0m27.116s
user    0m24.391s
sys 0m0.819s


标签: python random