I've been haunted by this error for quite a while so I decided to post it here.
This segmentation fault happened when a cudaMemcpy is called:
CurrentGrid->cdata[i] = new float[size];
cudaMemcpy(CurrentGrid->cdata[i], Grid_dev->cdata[i], size*sizeof(float),\
cudaMemcpyDeviceToHost);
CurrentGrid
and Grid_dev
are pointer to a grid
class object on host and device respectively and i=0 in this context. Class member cdata
is a float type pointer array. For debugging, right before this cudaMemcpy call I printed out the value of each element of Grid_Dev->cdata[i]
, the address of CurrentGrid->cdata[i]
and Grid_dev->cdata[i]
and the value of size
, which all looks good. But it still ends up with "Segmentation fault (core dumped)", which is the only error message. cuda-memcheck only gave "process didn't terminate successfully". I'm not able to use cuda-gdb at the moment. Any suggestion about where to go?
UPDATE: It seems now I have solved this problem by cudaMalloc another float pointer A on device and cudaMemcpy the value of Grid_dev->cdata[i] to A, and then cudaMemcpy A to host. So the segment of code written above becomes:
float * A;
cudaMalloc((void**)&A, sizeof(float));
...
...
cudaMemcpy(&A, &(Grid_dev->cdata[i]), sizeof(float *), cudaMemcpyDeviceToHost);
CurrentGrid->cdata[i] = new float[size];
cudaMemcpy(CurrentGrid->cdata[i], A, size*sizeof(float), cudaMemcpyDeviceToHost);
I did this because valgrind popped up "invalid read of size 8", which I thought referring to Grid_dev->cdata[i]
. I checked it again with gdb, printing out the value of Grid_dev->cdata[i]
being NULL. So I guess I cannot directly dereference the device pointer even in this cudaMemcpy call. But why ? According to the comment at the bottom of this thread , we should be able to dereference device pointer in cudaMemcpy function.
Also, I don't know the the underlying mechanism of how cudaMalloc and cudaMemcpy work but I think by cudaMalloc a pointer, say A here, we actually assign this pointer to point to a certain address on the device. And by cudaMemcpy the Grid_dev->cdata[i]
to A as in the modified code above, we re-assign the pointer A to point to the array. Then don't we lose the track of the previous address that A pointed to when it is cudaMalloced? Could this cause memory leak or something? If yes, how should I work around this situation properly?
Thanks!
For reference I put the code of the complete function in which this error happened below.
Many thanks!
__global__ void Print(grid *, int);
__global__ void Printcell(grid *, int);
void CopyDataToHost(param_t p, grid * CurrentGrid, grid * Grid_dev){
cudaMemcpy(CurrentGrid, Grid_dev, sizeof(grid), cudaMemcpyDeviceToHost);
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy1 error");
#endif
printf("\nBefore copy cell data\n");
Print<<<1,1>>>(Grid_dev, 0); //Print out some Grid_dev information for
cudaDeviceSynchronize(); //debug
int NumberOfBaryonFields = CurrentGrid->ReturnNumberOfBaryonFields();
int size = CurrentGrid->ReturnSize();
int vsize = CurrentGrid->ReturnVSize();
CurrentGrid->FieldType = NULL;
CurrentGrid->FieldType = new int[NumberOfBaryonFields];
printf("CurrentGrid size is %d\n", size);
for( int i = 0; i < p.NumberOfFields; i++){
CurrentGrid->cdata[i] = NULL;
CurrentGrid->vdata[i] = NULL;
CurrentGrid->cdata[i] = new float[size];
CurrentGrid->vdata[i] = new float[vsize];
Printcell<<<1,1>>>(Grid_dev, i);//Print out element value of Grid_dev->cdata[i]
cudaDeviceSynchronize();
cudaMemcpy(CurrentGrid->cdata[i], Grid_dev->cdata[i], size*sizeof(float),\
cudaMemcpyDeviceToHost); //where error occurs
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy2 error");
#endif
printf("\nAfter copy cell data\n");
Print<<<1,1>>>(Grid_dev, i);
cudaDeviceSynchronize();
cudaMemcpy(CurrentGrid->vdata[i], Grid_dev->vdata[i], vsize*sizeof(float),\
cudaMemcpyDeviceToHost);
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy3 error");
#endif
}
cudaMemcpy(CurrentGrid->FieldType, Grid_dev->FieldType,\
NumberOfBaryonFields*sizeof(int), cudaMemcpyDeviceToHost);
#if DEBUG_DEV
cudaCheckErrors("cudaMemcpy4 error");
#endif
}
EDIT: here is the information from valgrind, from which I'm trying to track down where the memory leak happened.
==19340== Warning: set address range perms: large range [0x800000000, 0xd00000000) (noaccess)
==19340== Warning: set address range perms: large range [0x200000000, 0x400000000) (noaccess)
==19340== Invalid read of size 8
==19340== at 0x402C79: CopyDataToHost(param_t, grid*, grid*) (CheckDevice.cu:48)
==19340== by 0x403646: CheckDevice(param_t, grid*, grid*) (CheckDevice.cu:186)
==19340== by 0x40A6CD: main (Transport.cu:81)
==19340== Address 0x2003000c0 is not stack'd, malloc'd or (recently) free'd
==19340==
==19340==
==19340== Process terminating with default action of signal 11 (SIGSEGV)
==19340== Bad permissions for mapped region at address 0x2003000C0
==19340== at 0x402C79: CopyDataToHost(param_t, grid*, grid*) (CheckDevice.cu:48)
==19340== by 0x403646: CheckDevice(param_t, grid*, grid*) (CheckDevice.cu:186)
==19340== by 0x40A6CD: main (Transport.cu:81)
==19340==
==19340== HEAP SUMMARY:
==19340== in use at exit: 2,611,365 bytes in 5,017 blocks
==19340== total heap usage: 5,879 allocs, 862 frees, 4,332,278 bytes allocated
==19340==
==19340== LEAK SUMMARY:
==19340== definitely lost: 0 bytes in 0 blocks
==19340== indirectly lost: 0 bytes in 0 blocks
==19340== possibly lost: 37,416 bytes in 274 blocks
==19340== still reachable: 2,573,949 bytes in 4,743 blocks
==19340== suppressed: 0 bytes in 0 blocks
==19340== Rerun with --leak-check=full to see details of leaked memory
==19340==
==19340== For counts of detected and suppressed errors, rerun with: -v
==19340== ERROR SUMMARY: 1 errors from 1 contexts (suppressed: 2 from 2)
I believe I know what the problem is, but to confirm it, it would be useful to see the code that you are using to set up the
Grid_dev
classes on the device.When a class or other data structure is to be used on the device, and that class has pointers in it which refer to other objects or buffers in memory (presumably in device memory, for a class that will be used on the device), then the process of making this top-level class usable on the device becomes more complicated.
Suppose I have a class like this:
I could instantiate the above class on the host, and then
malloc
an array ofint
and assign that pointer tomyptr
, and everything would be fine. To make this class usable on the device and the device only, the process could be similar. I could:myclass
myclass
on the host to the device pointer from step 1 using cudaMemcpymalloc
ornew
to allocate device storage formyptr
The above sequence is fine if I never want to access the storage allocated for
myptr
on the host. But if I do want that storage to be visible from the host, I need a different sequence:myclass
, let's call thismydevobj
myclass
on the host to the device pointermydevobj
from step 1 using cudaMemcpymyhostptr
int
storage on the device formyhostptr
myhostptr
from the host to the device pointer&(mydevobj->myptr)
After that, you can
cudaMemcpy
the data pointed to by the embedded pointermyptr
to the region allocated (viacudaMalloc
) onmyhostptr
Note that in step 5, because I am taking the address of this pointer location, this cudaMemcpy operation only requires the
mydevobj
pointer on the host, which is valid in a cudaMemcpy operation (only).The value of the device pointer
myint
will then be properly set up to do the operations you are trying to do. If you then want to cudaMemcpy data to and frommyint
to the host, you use the pointermyhostptr
in any cudaMemcpy calls, notmydevobj->myptr
. If we tried to usemydevobj->myptr
, it would require dereferencingmydevobj
and then using it to retrieve the pointer that is stored inmyptr
, and then using that pointer as the copy to/from location. This is not acceptable in host code. If you try to do it, you will get a seg fault. (Note that by way of analogy, mymydevobj
is like yourGrid_dev
and mymyptr
is like yourcdata
)Overall it is a concept that requires some careful thought the first time you run into it, and so questions like this come up with some frequency on SO. You may want to study some of these questions to see code examples (since you haven't provided your code that sets up
Grid_dev
):