My kernel call fails with "out of memory". It makes significant usage of the stack frame and I was wondering if this is the reason for its failure.
When invoking nvcc with --ptxas-options=-v it print the following profile information:
150352 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 59 registers, 40 bytes cmem[0]
Hardware: GTX480, sm20, 1.5GB device memory, 48KB shared memory/multiprocessor.
My question is where is the stack frame allocated: In shared, global memory, constant memory, ..?
I tried with 1 thread per block, as well as with 32 threads per block. Same "out of memory".
Another issue: One can only enlarge the number of threads resident to one multiprocessor if the total numbers of registers do not exceed the number of available registers at the multiprocessor (32k for my card). Does something similar apply to the stack frame size?