How to create variable sized __local memory in pyo

2019-05-08 06:43发布

问题:

in my C OpenCL code I use clSetKernelArg to create 'variable size' __local memory for use in my kernels, which is not available in OpenCL per se. See my example:

clSetKernelArg(clKernel, ArgCounter++, sizeof(cl_mem), (void *)&d_B);
...
clSetKernelArg(clKernel, ArgCounter++, sizeof(float)*block_size*block_size, NULL);
...
kernel="    
    matrixMul(__global float* C,
                   ...
              __local float* A_temp,
                  ...
             )"
{...

My question is now, how to do the same in pyopencl?

I looked through the examples that come with pyopencl, but the only thing I could find was an approach using templates, which seems as to me as I understood it like an overkill. See example.

kernel = """
        __kernel void matrixMul(__global float* C,...){
                     ... 
            __local float A_temp[ %(mem_size) ];
                     ...
        }

What do you recommend?

回答1:

It is similar to C. You pass it a fixed size array as a local. Here is an example from Enja's radix sort. Notice the last argument is a local memory array.

 def naive_scan(self, num):
    nhist = num/2/self.cta_size*16
    global_size = (nhist,)
    local_size = (nhist,)
    extra_space = nhist / 16 #NUM_BANKS defined as 16 in RadixSort.cpp
    shared_mem_size = self.uintsz * (nhist + extra_space)
    scan_args = (   self.mCountersSum,
                    self.mCounters,
                    np.uint32(nhist),
                    cl.LocalMemory(2*shared_mem_size)
                )
    self.radix_prg.scanNaive(self.queue, global_size, local_size, *(scan_args)).wait()


回答2:

I am no familiar with Python and its OpenCL implementation, but a local memory can also be created within the kernel with a fixed size (similar what you did):

__kernel void matrixMul(...) {

    __local float A_templ[1024];

}

Instead of 1024 a defined preprocessor symbol can be used and can be set during compilation to change the size:

#define SIZE 1024    
__kernel void matrixMul(...) {

        __local float A_templ[SIZE];

    }

SIZE can be defined within the same soure, as compiler parameter for cLBuildProgram or as an additional source for clCreateProgramWithSource.

EDIT: Found something with Google ;-): http://www.google.com/url?sa=t&source=web&cd=4&ved=0CC8QFjAD&url=http%3A%2F%2Flinksceem.eu%2Fjoomla%2Ffiles%2FPRACE_Winter_School%2FLinkSCEMM_pyOpenCL.pdf&rct=j&q=Pyopencl%20__local%20memory&ei=BTbETbWhOsvBswadp62ODw&usg=AFQjCNG6rXEEkDpE1304pmQDu3GFdRA0BQ&sig2=vHOGOqwA1HHUl10c6HO8WQ&cad=rja