I'm getting an out-of-resources error when trying to launch a CUDA kernel (through PyCUDA), and I'm wondering if it's possible to get the system to tell me which resource it is that I'm short on. Obviously the system knows what resource has been exhausted, I just want to query that as well.
I've used the occupancy calculator, and everything seems okay, so either there's a corner case not covered, or I'm using it wrong. I know it's not registers (which seems to be the usual culprit) because I'm using <= 63 and it still fails with a 1x1x1 block and 1x1 grid on a CC 2.1 device.
Thanks for any help. I posted a thread on the NVidia boards:
http://forums.nvidia.com/index.php?showtopic=206261&st=0
But got no responses. If the answer is "you can't ask the system for that information" that would be nice to know too (sort of... ;).
Edit:
The most register usage I've seen has been 63. Edited the above to reflect that.
See this answer
CUDA maximum registers per thread: sm_12 vs sm_20
It seems 70 registers is too many registers.
I think PyCUDA uses the CUDA driver API, so the following may be what is wrong: CUDA_ERROR_LAUNCH_OUT_OF_RESOURCES can happen if you do not specify enough arguments, or you specify the wrong size for arguments, when using
cuLaunch()
to launch kernels. Since you are using PyCUDA, it could be pretty easy to mismatch the argument list required for a kernel and the arguments you are actually passing, so you might want to check how you are calling your kernels.I think that this is a poorly named error code in this situation...