CUDA, cuPrintf causes “unspecified launch failure”

2020-04-19 11:56发布

问题:

I have a kernel which runs twice with different grid size.

My problem is with cuPrintf. When I don't have cudaPrintfInit() before kernel run and cudaPrintfDisplay(stdout, true) and cudaPrintfEnd() after kernel run, I have no error but when I put them there I get "unspecified launch failure" error.

In my device code, there is only one loop like this for printing:

if (threadIdx.x==0) {
     cuPrintf("MAX:%f x:%d y:%d\n", maxVal, blockIdx.x, blockIdx.y);
}

I'm using CUDA 4.0 with a card with cuda capability 2.0 and so I'm compiling my code with this syntax:

nvcc LB2.0.cu -arch=compute_20 -code=sm_20  

回答1:

If you are on a CC 2.0 GPU, you don't need cuPrintf at all -- CUDA has printf built-in for CC-2.0 and higher GPUs. So just replace your call to cuPrintf with this:

#if __CUDA_ARCH__ >= 200
if (threadIdx.x==0) {
    printf("MAX:%f x:%d y:%d\n", maxVal, blockIdx.x, blockIdx.y);
}
#endif

(Note you only need the #if / #endif lines if you are compiling your code for sm_20 and also earlier versions. With the example compilation command line you gave, you can eliminate them.)

With printf, you don't need cudaPrintfInit() or cudaPrintfDisplay() -- it is automatic. However if you print a lot of data, you may need to increase the default printf FIFO size with cudaDeviceSetLimit(), passing the cudaLimitPrintfFifoSize option.