I've struggled with this all day, I am trying to get a random number generator for threads in my CUDA code. I have looked through all forums and yes this topic comes up a fair bit but I've spent hours trying to unravel all sorts of codes to no avail. If anyone knows of a simple method, probably a device kernel that can be called to returns a random float between 0 and 1, or an integer that I can transform I would be most grateful.
Again, I hope to use the random number in the kernel, just like rand() for instance.
Thanks in advance
Depending on your application you should be wary of using LCGs without considering whether the streams (one stream per thread) will overlap. You could implement a leapfrog with LCG, but then you would need to have a sufficiently long period LCG to ensure that the sequence doesn't repeat.
An example leapfrog could be:
But then the period of that generator is probably insufficient in most cases.
To be honest, I'd look at using a third party library such as NAG. There are some batch generators in the SDK too, but that's probably not what you're looking for in this case.
EDIT
Since this just got up-voted, I figure it's worth updating to mention that cuRAND, as mentioned by more recent answers to this question, is available and provides a number of generators and distributions. That's definitely the easiest place to start.
You could try out Mersenne Twister for GPUs
It is based on SIMD-oriented Fast Mersenne Twister (SFMT) which is a quite fast and reliable random number generator. It passes Marsaglias DIEHARD tests for Random Number Generators.
I haven't found a good parallel number generator for CUDA, however I did find a parallel random number generator based on academic research here: http://sprng.cs.fsu.edu/
There's an MDGPU package (GPL) which includes an implementation of the GNU rand48() function for CUDA here.
I found it (quite easily, using Google, which I assume you tried :-) on the NVidia forums here.
I'm not sure I understand why you need anything special. Any traditional PRNG should port more or less directly. A linear congruential should work fine. Do you have some special properties you're trying to establish?
The best way for this is writing your own device function , here is the one
It'll give you 100 random numbers with 32 bit result.
If you want some random numbers between 1 and 1000, you can also take the
result%1000
, either at the point of consumption, or at the point of generation:Changing m_w and m_z starting values (in the example, 150 and 40) allows you to get a different results each time. You can use
threadIdx.x
as one of them, which should give you different pseudorandom series each time.I wanted to add that it works 2 time faster than rand() function, and works great ;)