Interpreting the verbose output of ptxas, part II

2019-08-03 04:48发布

问题:

This question is a continuation of Interpreting the verbose output of ptxas, part I .

When we compile a kernel .ptx file with ptxas -v, or compile it from a .cu file with -ptxas-options=-v, we get a few lines of output such as:

ptxas info    : Compiling entry function 'searchkernel(octree, int*, double, int, double*, double*, double*)' for 'sm_20'
ptxas info    : Function properties for searchkernel(octree, int*, double, int, double*, double*, double*)
    72 bytes stack frame, 0 bytes spill stores, 0 bytes spill loads
ptxas info    : Used 46 registers, 176 bytes cmem[0], 16 bytes cmem[14]

(same example as in the linked-to question; but with name demangling)

This question regards the last line. A few more examples from other kernels:

ptxas info    : Used 19 registers, 336 bytes cmem[0], 4 bytes cmem[2]
...
ptxas info    : Used 19 registers, 336 bytes cmem[0]
... 
ptxas info    : Used 6 registers, 16 bytes smem, 328 bytes cmem[0]

How do we interpret the information on this line, other than the number of registers used? Specifically:

  • Is cmem short for constant memory?
  • Why are there different categories of cmem, i.e. cmem[0], cmem[2], cmem[14]?
  • smem probably stands for shared memory; is it only static shared memory?
  • Under which conditions does each kind of entry appear on this line?

回答1:

Is cmem short for constant memory?

Yes

Why are there different categories of cmem, i.e. cmem[0], cmem[2], cmem[14]?

They represent different constant memory banks. cmem[0] is the reserved bank for kernel arguments and statically sized constant values.

smem probably stands for shared memory; is it only static shared memory?

It is, and how could it be otherwise.

Under which conditions does each kind of entry appear on this line?

Mostly answered here.



回答2:

Collected and reformatted...

Resources on the last ptxas info line:

  • registers - in the register file on every SM (multiprocessor)
  • gmem - Global memory
  • smem - Static Shared memory
  • cmem[N] - Constant memory bank with index N.
    • cmem[0] - Bank reserved for kernel argument and statically-sized constant values
    • cmem[2] - ???
    • cmem[4] - ???
    • cmem[14] - ???

Each of these categories will be shown if the kernel uses any such memory (Registers - probably always shown); thus it is no surprise all the examples show some cmem[0] usage.

You can read a bit more on the CUDA memory hierarchy in Section 2.3 of the Programming Guide and the links there. Also, there's this blog post about static vs dynamic shared memory.