My First Question is How to get registers used information for OpenCL kernel code on Nvidia GPU, as nvcc complier gives the same using nvcc --ptxas-options=-v
flag for CUDA kernel code.
I also got the same information on AMD GPU for OpenCL kernel, from .isa file
generated while running the program, after exporting GPU_DUMP_DEVICE_KERNEL=3
. Same thing i also tried on Nvidia GPU but it did not get .isa file
. My second question is that why Nvidia GPU not generating .isa file
?
After googling I found the way to get registers and shared memory used information for OpenCL kernel on Nvidia GPU is to use cl-nv-verbose
string flag into the clBuildProgram() function call. And then read "binaries" information of complied kernel code.
My third question Is it correct way to get registers used information on Nvidia GPU? What are the others way to get same ?
//Building the program...
clBuildProgram(program, 1, &device_id, "-cl-nv-verbose", NULL, NULL);
after building the program i used two constants CL_PROGRAM_BINARY_SIZES and CL_PROGRAM_BINARIES
into the clGetProgramInfo() function to get binaries of compiled kernel code.
// Printing Binaries of complied kernel code...
cl_uint program_num_devices, ret;
size_t t;
ret = clGetProgramInfo(program, CL_PROGRAM_NUM_DEVICES, sizeof(cl_uint), &program_num_devices, NULL);
if(program_num_devices == 0) {
printf("No valid device was found \n");
return ;
}
size_t binary_sizes[program_num_devices];
char **binaries = (char **) malloc(program_num_devices * sizeof(char* ));
//first call to get size of ISA binary file...
ret = clGetProgramInfo(program, CL_PROGRAM_BINARY_SIZES, program_num_devices * sizeof(size_t), &binary_sizes, NULL);
for(t = 0; t < program_num_devices; t++) {
binaries[t] = (char *) malloc((binary_sizes[t] + 1) * sizeof(char));
}
//second call to get ISA info....
ret = clGetProgramInfo(program, CL_PROGRAM_BINARIES, program_num_devices * sizeof(size_t), binaries, NULL);
for(t = 0; t < program_num_devices; t++) {
binaries[t][binary_sizes[t]] = '\0';
printf("Binary ISA Info%s : %lu \n", binaries[t], binary_sizes[t]);
}
printf("ProgramNumDevices:: %u\n", program_num_devices);
for(t = 0; t < program_num_devices; t++) {
free(binaries[t]);
}
This is printing "binaries" of my complied OpenCl kernel code. But it is not displaying registers and shared memory used information. Why?
Please share some useful informations .
Thanks in Advance !!!!
From a quick search, it looks like that after building the program with
-cl-nv-verbose
, you get the verbose output withclGetProgramBuildInfo(...,CL_PROGRAM_BUILD_LOG,...)
.