cudaMallocManaged with vector<complex> > C++ - NVI

I am in the process of implementing multithreading through a NVIDIA GeForce GT 650M GPU for a simulation I have created. In order to make sure everything works properly, I have created some side code to test that everything works. At one point I need to update a vector of variables (they can all be updated separately).

Here is the gist of it:

`\__device__
int doComplexMath(float x, float y)
{
    return x+y;
}`

`// Kernel function to add the elements of two arrays
__global__
void add(int n, float *x, float *y, vector<complex<long double> > *z)
{
    int index = blockIdx.x * blockDim.x + threadIdx.x;
    int stride = blockDim.x * gridDim.x;
    for (int i = index; i < n; i += stride)
        z[i] = doComplexMath(*x, *y);
}`

`int main(void)
{
    int iGAMAf = 1<<10;
    float *x, *y;
    vector<complex<long double> > VEL(iGAMAf,zero);
    // Allocate Unified Memory – accessible from CPU or GPU
    cudaMallocManaged(&x, sizeof(float));
    cudaMallocManaged(&y, sizeof(float));
    cudaMallocManaged(&VEL, iGAMAf*sizeof(vector<complex<long double> >));
    // initialize x and y on the host
    *x = 1.0f;
    *y = 2.0f;
    // Run kernel on 1M elements on the GPU
    int blockSize = 256;
    int numBlocks = (iGAMAf + blockSize - 1) / blockSize;
    add<<<numBlocks, blockSize>>>(iGAMAf, x, y, *VEL);
    // Wait for GPU to finish before accessing on host
    cudaDeviceSynchronize();
    return 0;
}`

I am trying to allocate unified memory (memory accessible from the GPU and CPU). When compiling using nvcc, I get the following error:

error: no instance of overloaded function "cudaMallocManaged" matches the argument list argument types are: (std::__1::vector, std::__1::allocator>> *, unsigned long)

How can I overload the function properly in CUDA to use this type with multithreading?

It isn't possible to do what you are trying to do.

To allocate a vector using managed memory you would have to write your own implementation of an allocator which inherits from std::allocator_traits and calls cudaMallocManaged under the hood. You can then instantiate a std::vector using your allocator class.

Also note that your CUDA kernel code is broken in that you can't use std::vector in device code.

Note that although the question has managed memory in view, this is applicable to other types of CUDA allocation such as pinned allocation.

As another alternative, suggested here, you could consider using a thrust host vector in lieu of std::vector and use a custom allocator with it. A worked example is here in the case of pinned allocator (cudaMallocHost/cudaHostAlloc).