I am currently writing an app in python that needs to generate large amount of random numbers, FAST. Currently I have a scheme going that uses numpy to generate all of the numbers in a giant batch (about ~500,000 at a time). While this seems to be faster than python's implementation. I still need it to go faster. Any ideas? I'm open to writing it in C and embedding it in the program or doing w/e it takes.
Constraints on the random numbers:
- A Set of 7 numbers that can all have different bounds:
- eg: [0-X1, 0-X2, 0-X3, 0-X4, 0-X5, 0-X6, 0-X7]
- Currently I am generating a list of 7 numbers with random values from [0-1) then multiplying by [X1..X7]
- A Set of 13 numbers that all add up to 1
- Currently just generating 13 numbers then dividing by their sum
Any ideas? Would pre calculating these numbers and storing them in a file make this faster?
Thanks!
Just a quick example of
numpy
in action:No need for loop, you can pass in how many numbers you want to generate.
Try
r = 1664525*r + 1013904223
from "an even quicker generator" in "Numerical Recipes in C" 2nd edition, Press et al., isbn 0521431085, p. 284.
np.random is certainly "more random"; see Linear congruential generator .
In python, use
np.uint32
like this:To generate big blocks at a time:
Making your code run in parallel certainly couldn't hurt. Try adapting it for SMP with Parallel Python
You can speed things up a bit from what mtrw posted above just by doing what you initially described (generating a bunch of random numbers and multiplying and dividing accordingly)...
Also, you probably already know this, but be sure to do the operations in-place (*=, /=, +=, etc) when working with large-ish numpy arrays. It makes a huge difference in memory usage with large arrays, and will give a considerable speed increase, too.
As compared to:
It's not a huge difference, but if you're really worried about speed, it's something.
Just to show that it's correct:
Likewise, for your "rows sum to one" part...
Honestly, even if you re-implement things in C, I'm not sure you'll be able to beat numpy by much on this one... I could be very wrong, though!
EDIT Created functions that return the full set of numbers, not just one row at a time. EDIT 2 Make the functions more pythonic (and faster), add solution for second question
For the first set of numbers, you might consider
numpy.random.randint
ornumpy.random.uniform
, which takelow
andhigh
parameters. Generating an array of 7 x 1,000,000 numbers in a specified range seems to take < 0.7 second on my 2 GHz machine:This returns integers in [0,xLim-1] or floats in [0,fLim). The integer version took ~0.3 seconds, the double ~0.66, on my 2 GHz single-core machine.
For the second set, I used @Joe Kingston's suggestion.
This takes ~1.6 seconds.
In all cases,
result[k]
gives you the kth set of data.As others have already pointed out,
numpy
is a very good start, fast and easy to use.If you need random numbers on a massive scale, consider eas-ecb or rc4. Both can be parallelised, you should reach performance in several GB/s.
achievable numbers posted here