This question is a continuation of Interpreting the verbose output of ptxas, part I .
When we compile a kernel .ptx
file with ptxas -v
, or compile it from a .cu
file with -ptxas-options=-v
, we get a few lines of output such as:
ptxas info : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info : Used 46 registers, 176 bytes cmem[0], 16 bytes cmem[14]
(same example as in the linked-to question; but with name demangling)
This question regards the last line. A few more examples from other kernels:
ptxas info : Used 19 registers, 336 bytes cmem[0], 4 bytes cmem[2]
...
ptxas info : Used 19 registers, 336 bytes cmem[0]
...
ptxas info : Used 6 registers, 16 bytes smem, 328 bytes cmem[0]
How do we interpret the information on this line, other than the number of registers used? Specifically:
- Is
cmem
short for constant memory? - Why are there different categories of
cmem
, i.e.cmem[0]
,cmem[2]
,cmem[14]
? smem
probably stands forshared memory
; is it only static shared memory?- Under which conditions does each kind of entry appear on this line?
Collected and reformatted...
Resources on the last ptxas info line:
registers
- in the register file on every SM (multiprocessor)gmem
- Global memorysmem
- Static Shared memorycmem[N]
- Constant memory bank with index N.cmem[0]
- Bank reserved for kernel argument and statically-sized constant valuescmem[2]
- ???cmem[4]
- ???cmem[14]
- ???Each of these categories will be shown if the kernel uses any such memory (Registers - probably always shown); thus it is no surprise all the examples show some
cmem[0]
usage.You can read a bit more on the CUDA memory hierarchy in Section 2.3 of the Programming Guide and the links there. Also, there's this blog post about static vs dynamic shared memory.
Yes
They represent different constant memory banks.
cmem[0]
is the reserved bank for kernel arguments and statically sized constant values.It is, and how could it be otherwise.
Mostly answered here.