I have written a function in C++ to let me take advantage of the new Intel RdRand digital random number generator via an intrinsic function.
__declspec(dllexport) int __stdcall GetRdRand32(PUINT32 pValue)
{
return _rdrand32_step(pValue);
}
I have wrapped it so that I can use it in C# via PInvoke and it is working fine as follows:
[DllImport("CppDynamicLinkLibrary.dll", CallingConvention = CallingConvention.StdCall)]
public static extern int GetRdRand32(out UInt32 str);
My use case could often involve requesting more than one random number although probably only on the order of hundreds at a time (per requester). My question is, as I'm using C++ anyway, would it make sense to put together another function that can return a dynamic array (or vector) of random numbers, i.e. would this greatly improve performance over just making multiple calls to the C++ DLL? Performance is a concern because this will be on a server application that could be sending ~200 random numbers to many clients at similar times
If it is worthwhile doing, how would I go about doing it? I was thinking something along the lines of the following, although my guess is finding a way to get the vector into C# could easily be a performance concern?
__declspec(dllexport) void __stdcall vGetRdRand32(std::vector<UINT32> &pArray)
{
for (std::vector<UINT32>::iterator It = pArray.begin(); It != pArray.end(); It++ )
_rdrand32_step(&(*It));
}
Finally, would Marshal.Copy be better than the latter approach, could anyone point me in right direction if it would be?
Certainly, getting 200 random numbers from a single call will be faster than getting 200 random numbers from 200 different calls. It might even be many times faster. But it's likely that you're talking a difference of milliseconds. So it might not be worth doing. Will the difference of a few milliseconds make a noticeable difference to the overall performance of your application?
If you do decide to do it, you probably don't want to mess with vector
, but rather with UINT32[]
. Marshaling a vector
between C# and C++ would be difficult at best. For all practical purposes, impossible.
See Marshaling Different Types of Arrays for examples of how to marshal arrays.
You'll probably want to allocate the array in C# and pass it along with the size to the C++ function. That way, you don't have to worry about deallocating the memory. If you have the C++ code allocate the array and return it, then the C# code will have to call a C++ function to deallocate the memory.
It rather depends how fast you need to go. For the fastest rdrand performance, use 64 bit rdrands and pull with multiple threads. 2 threads pulling is 2X as fast a one thread pulling, even on 2 hyperthreads on the same core.
So if you set all threads on all cores pulling in parallel at 64 bits, you should be able to get close to 800MBytes/s.
This may be counter intuitive, but it arises from the parallelism on the on chip buses that leads to this performance characteristic.
A single thread in a loop might get 70MBytes/s on Ivy Bridge.
So for just 200 random numbers, the call overhead will dominate. But for a few megabytes, spawning threads is worthwhile if you want it to be as fast as possible.