I have written a function in C++ to let me take advantage of the new Intel RdRand digital random number generator via an intrinsic function.
__declspec(dllexport) int __stdcall GetRdRand32(PUINT32 pValue)
{
return _rdrand32_step(pValue);
}
I have wrapped it so that I can use it in C# via PInvoke and it is working fine as follows:
[DllImport("CppDynamicLinkLibrary.dll", CallingConvention = CallingConvention.StdCall)]
public static extern int GetRdRand32(out UInt32 str);
My use case could often involve requesting more than one random number although probably only on the order of hundreds at a time (per requester). My question is, as I'm using C++ anyway, would it make sense to put together another function that can return a dynamic array (or vector) of random numbers, i.e. would this greatly improve performance over just making multiple calls to the C++ DLL? Performance is a concern because this will be on a server application that could be sending ~200 random numbers to many clients at similar times
If it is worthwhile doing, how would I go about doing it? I was thinking something along the lines of the following, although my guess is finding a way to get the vector into C# could easily be a performance concern?
__declspec(dllexport) void __stdcall vGetRdRand32(std::vector<UINT32> &pArray)
{
for (std::vector<UINT32>::iterator It = pArray.begin(); It != pArray.end(); It++ )
_rdrand32_step(&(*It));
}
Finally, would Marshal.Copy be better than the latter approach, could anyone point me in right direction if it would be?
Certainly, getting 200 random numbers from a single call will be faster than getting 200 random numbers from 200 different calls. It might even be many times faster. But it's likely that you're talking a difference of milliseconds. So it might not be worth doing. Will the difference of a few milliseconds make a noticeable difference to the overall performance of your application?
If you do decide to do it, you probably don't want to mess with
vector
, but rather withUINT32[]
. Marshaling avector
between C# and C++ would be difficult at best. For all practical purposes, impossible.See Marshaling Different Types of Arrays for examples of how to marshal arrays.
You'll probably want to allocate the array in C# and pass it along with the size to the C++ function. That way, you don't have to worry about deallocating the memory. If you have the C++ code allocate the array and return it, then the C# code will have to call a C++ function to deallocate the memory.
It rather depends how fast you need to go. For the fastest rdrand performance, use 64 bit rdrands and pull with multiple threads. 2 threads pulling is 2X as fast a one thread pulling, even on 2 hyperthreads on the same core.
So if you set all threads on all cores pulling in parallel at 64 bits, you should be able to get close to 800MBytes/s.
This may be counter intuitive, but it arises from the parallelism on the on chip buses that leads to this performance characteristic.
A single thread in a loop might get 70MBytes/s on Ivy Bridge.
So for just 200 random numbers, the call overhead will dominate. But for a few megabytes, spawning threads is worthwhile if you want it to be as fast as possible.