I'm getting an out-of-resources error when trying to launch a CUDA kernel (through PyCUDA), and I'm wondering if it's possible to get the system to tell me which resource it is that I'm short on. Obviously the system knows what resource has been exhausted, I just want to query that as well.
I've used the occupancy calculator, and everything seems okay, so either there's a corner case not covered, or I'm using it wrong. I know it's not registers (which seems to be the usual culprit) because I'm using <= 63 and it still fails with a 1x1x1 block and 1x1 grid on a CC 2.1 device.
Thanks for any help. I posted a thread on the NVidia boards:
http://forums.nvidia.com/index.php?showtopic=206261&st=0
But got no responses. If the answer is "you can't ask the system for that information" that would be nice to know too (sort of... ;).
Edit:
The most register usage I've seen has been 63. Edited the above to reflect that.