OpenCL autocorrelation kernel

2020-08-05 05:54发布

I have written a simple program that does autocorrelation as follows...I've used pgi accelerator directives to move the computation to GPUs.

//autocorrelation
void autocorr(float *restrict A, float *restrict C, int N)
{
      int i, j;
      float sum;
      #pragma acc region
      {
        for (i = 0; i < N; i++) {
                        sum = 0.0;
                for (j = 0; j < N; j++) {
                    if ((i+j) < N)
                      sum += A[j] * A[i+j];
                    else
                      continue;
               }
            C[i] = sum;
       }
       }
}

I wrote a similar program in OpenCL, but I am not getting correct results. The program is as follows...I am new to GPU programming, so apart from hints that could fix my error, any other advices are welcome.

__kernel void autocorrel1D(__global double *Vol_IN, __global double *Vol_AUTOCORR, int size)
{

    int j, gid = get_global_id(0);
    double sum = 0.0;

    for (j = 0; j < size; j++) {
            if ((gid+j) < size)
            {
               sum += Vol_IN[j] * Vol_IN[gid+j];
            }
            else
               continue;
               }

    barrier(CLK_GLOBAL_MEM_FENCE);
    Vol_AUTOCORR[gid] = sum;

}

Since I have passed the dimension to be 1, so I am considering my get_global_size(0) call would give me the id of the current block, which is used to access the input 1d array.

Thanks,
Sayan

标签: opencl
1条回答
该账号已被封号
2楼-- · 2020-08-05 06:26

The code is correct. As far as I know, that should run fine and give corret results.

barrier(CLK_GLOBAL_MEM_FENCE); is not needed. You'll get more speed without that sentence.

Your problem should be outside the kernel, check that you a re passing correctly the input, and you are taking out of GPU the correct data.

BTW, I supose you are using a double precision suported GPU as you are doing double calcs. Check that you are passing also double values. Remember you CAN't point a float pointer to a double value, and viceversa. That will give you wrong results.

查看更多
登录 后发表回答