Pointers in structs passed to CUDA

I've been messing around with this for a while now, but can't seem to get it right. I'm trying to copy objects that contain arrays into CUDA device memory (and back again, but I'll cross that bridge when I come to it):

struct MyData {
  float *data;
  int dataLen;
}

void copyToGPU() {
  // Create dummy objects to copy
  int N = 10;
  MyData *h_items = new MyData[N];
  for (int i=0; i<N; i++) {
    h_items[i].dataLen = 100;
    h_items[i].data = new float[100];
  }

  // Copy objects to GPU
  MyData *d_items;
  int memSize = N * sizeof(MyData);
  cudaMalloc((void**)&d_items, memSize);
  cudaMemCpy(d_items, h_items, memSize, cudaMemcpyHostToDevice);

  // Run the kernel
  MyFunc<<<100,100>>>(d_items);
}

__global__
static void MyFunc(MyData *data) {
  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  for (int i=0; i<data[idx].dataLen; i++) {
    // Do something with data[idx].data[i]
  }
}

When I call MyFunc(d_items), I can access data[idx].dataLen just fine. However, data[idx].data has not been copied yet.

I can't use d_items.data in copyToGPU as a destination for cudaMalloc/cudaMemCpy operations since the host code cannot dereference a device pointer.

What to do?

标签： arrays pointers cuda struct

2条回答

女痞

2楼-- · 2019-03-30 21:06

allocate device data for all structures, as a single array.
Copy contiguous data from host to GPU.
adjust GPU pointers

example:

float *d_data;
cudaMalloc((void**)&d_data, N*100*sizeof(float));
for (...) {
    h_items[i].data = i*100 + d_data;
}

0人赞添加讨论(0) 举报

男人必须洒脱

3楼-- · 2019-03-30 21:25

The code you provide copies MyData structures only: a host address and a integer. To be overly clear, you are copying the pointer and not the data - you have to explicitly copy the data.

If the data is always the same LENGTH, then you probably just want to make one big array:

float *d_data;
memSize = N * LENGTH * sizeof(float);
cudaMalloc((void**) &d_data, memSize);

//and a single copy
cudaMemcpy(d_data, h_data, memSize, cudaMemcpyHostToDevice);

If it needs to be in a struct with other data, then:

struct MyData {
  float data[LENGTH];
  int other_data;
}

MyData *d_items;
memSize = N * sizeof(MyData);
cudaMalloc((void**) &d_items, memSize);

//and again a single copy
cudaMemcpy(d_data, h_data, memSize, cudaMemcpyHostToDevice);

But, I am assuming you have data that is a variety of lengths. One solution is to set LENGTH to be the maximum length (and just waste some space), and then do it the same way as above. That might be the easiest way to start, and then you can optimize later.

If you can't afford the lost memory and transfer time, then I would have three arrays, one with all the data and then one with offsets and one with lengths, for both the host and device:

//host memory
float *h_data;
int h_offsets[N], h_lengths[N]; //or allocate these dynamically if necessary
int totalLength;

//device memory
float *d_data;
int *d_offsets, *d_lengths;

/* calculate totalLength, allocate h_data, and fill the three arrays */

//allocate device memory
cudaMalloc((void**) &d_data, totalLength * sizeof(float));
cudaMalloc((void**) &d_ffsets, N * sizeof(int));
cudaMalloc((void**) &d_lengths, N * sizeof(int));

//and now three copies
cudaMemcpy(d_data, h_data, totalLength * sizeof(float), cudaMemcpyHostToDevice);
cudaMemcpy(d_offsets, h_offsets, N * sizeof(int); cudaMemcpyHostToDevice);
cudaMemcpy(d_lengths, h_lengths, N * sizeof(int); cudaMemcpyHostToDevice);

Now in thread i you can find the data that starts at d_data[d_offsets[i]] and has a length of d_data[d_lengths[i]]

0人赞添加讨论(0) 举报

Pointers in structs passed to CUDA

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间