I'm trying to take advantage of the constant memory, but I'm having a hard time figuring out how to nest arrays. What I have is an array of data that has counts for internal data but those are different for each entry. So based around the following simplified code I have two problems. First I don't know how to allocate the data pointed to by the members of my data structure. Second, since I can't use cudaGetSymbolAddress for constant memory I'm not sure if I can just pass the global pointer (which you cannot do with plain __device__ memory).
struct __align(16)__ data{
int nFiles;
int nNames;
int* files;
int* names;
};
__device__ __constant__ data *mydata;
__host__ void initMemory(...)
{
cudaMalloc( (void **) &(mydata), sizeof(data)*dynamicsize );
for(int i=; i lessthan dynamicsize; i++)
{
cudaMemcpyToSymbol(mydata, &(nFiles[i]), sizeof(int), sizeof(data)*i, cudaMemcpyHostToDevice);
//...
//Problem 1: Allocate & Set mydata[i].files
}
}
__global__ void myKernel(data *constDataPtr)
{
//Problem 2: Access constDataPtr[n].files, etc
}
int main()
{
//...
myKernel grid, threads (mydata);
}
Thanks for any help offered. :-)
Why don't you just use the so-called "packed" data representation? This approach allows you to place all the data you need into one-dimension byte array. E.g., if you need to store
You can just store this data in the array this way:
These two threads should help you:
http://forums.nvidia.com/index.php?showtopic=30269&hl=embedded
I think constant memory is 64K and you cannot allocate it dynamically using CudaMalloc. It has to be declared constant, say,
Similarly you also don't need to free it. Also, you shouldn't pass the reference to it via pointer, just access it as a global variable. I tried doing a similar thing and it gave me segfault (in devicemu).
No, you cant do that.
Constant memory (64KB max) can only be hard-coded before compilation.
However you can assign texture memory on the fly which is also cached on the Device.