Where does CUDA allocate the stack frame for kerne

My kernel call fails with "out of memory". It makes significant usage of the stack frame and I was wondering if this is the reason for its failure.

When invoking nvcc with --ptxas-options=-v it print the following profile information:

    150352 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 59 registers, 40 bytes cmem[0]

Hardware: GTX480, sm20, 1.5GB device memory, 48KB shared memory/multiprocessor.

My question is where is the stack frame allocated: In shared, global memory, constant memory, ..?

I tried with 1 thread per block, as well as with 32 threads per block. Same "out of memory".

Another issue: One can only enlarge the number of threads resident to one multiprocessor if the total numbers of registers do not exceed the number of available registers at the multiprocessor (32k for my card). Does something similar apply to the stack frame size?

标签： cuda stack

2条回答

做自己的国王

2楼-- · 2019-01-26 12:12

Stack frame is most likely in the local memory.

I believe there is some limitation of the local memory usage, but even without it, I think CUDA driver might allocate more local memory than just for one thread in your <<<1,1>>> launch configuration.

One way or another, even if you manage to actually run your code, I fear it may be actually quite slow because of all those stack operations. Try to reduce the number of function calls (e.g. by inlining those functions).

0人赞添加讨论(0) 举报

▲ chillily

3楼-- · 2019-01-26 12:24

Stack is allocated in local memory. Allocation is per physical thread (GTX480: 15 SM * 1536 threads/SM = 23040 threads). You are requesting 150,352 bytes/thread => ~3.4 GB of stack space. CUDA may reduce the maximum physical threads per launch if the size is that high. The CUDA language is not designed to have a large per thread stack.

In terms of registers GTX480 is limited to 63 registers per thread and 32K registers per SM.

0人赞添加讨论(0) 举报

Where does CUDA allocate the stack frame for kerne

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间