I have written a simple program that does autocorrelation as follows...I've used pgi accelerator directives to move the computation to GPUs.
void autocorr(float *restrict A, float *restrict C, int N)
int i, j;
float sum;
#pragma acc region
for (i = 0; i < N; i++) {
sum = 0.0;
for (j = 0; j < N; j++) {
if ((i+j) < N)
sum += A[j] * A[i+j];
C[i] = sum;
I wrote a similar program in OpenCL, but I am not getting correct results. The program is as follows...I am new to GPU programming, so apart from hints that could fix my error, any other advices are welcome.
__kernel void autocorrel1D(__global double *Vol_IN, __global double *Vol_AUTOCORR, int size)
int j, gid = get_global_id(0);
double sum = 0.0;
for (j = 0; j < size; j++) {
if ((gid+j) < size)
sum += Vol_IN[j] * Vol_IN[gid+j];
Vol_AUTOCORR[gid] = sum;
Since I have passed the dimension to be 1, so I am considering my get_global_size(0) call would give me the id of the current block, which is used to access the input 1d array.