cuda atomicAdd example fails to yield correct outp

2020-08-01 06:28发布

问题:

The following code was written with the goal of incrementing a 100 element array of floats by 1 ten times. In the output, I was expecting a 100 element array of 10.0f value for each element. Instead, I get random values. Can you please point out my error here?

__global__  void testAdd(float *a)
{
    float temp;
    for (int i = 0; i < 100 ; i++)
    {
        a[i] = atomicAdd(&a[i], 1.0f);
    }
}
void cuTestAtomicAdd(float *a)
{
    testAdd<<<1, 10>>>(a);
}

My goal is to understand the workings of atomic operations, so as to apply them elsewhere.

回答1:

That's not how we do an atomicAdd operation.

Just do it like this:

atomicAdd(&a[i], 1.0f);

and the variable in question (a[i]) will be updated.

The return value from an atomic function is generally the old value that was in the variable, before the atomic update.

so doing this:

a[i] = atomicAdd(&a[i], 1.0f);

will update the variable a[i], and then (non-atomically) assign the old value to the variable a[i]. That's almost certainly not what you want.

Read the documentation:

The function returns old.

The following complete code demonstrates correct usage:

#include <iostream>

__global__  void testAdd(float *a)
{
    for (int i = 0; i < 100 ; i++)
    {
        atomicAdd(&a[i], 1.0f);
    }
}
void cuTestAtomicAdd(float *a)
{
    testAdd<<<1, 10>>>(a);
}

int main(){

  float *d_data, *h_data;
  h_data=(float *) malloc(100*sizeof(float));
  cudaMalloc((void **)&d_data, 100*sizeof(float));
  cudaMemset(d_data, 0, 100*sizeof(float));
  cuTestAtomicAdd(d_data);
  cudaMemcpy(h_data, d_data, 100*sizeof(float), cudaMemcpyDeviceToHost);
  for (int i = 0; i < 100; i++)
    if (h_data[i] != 10.0f) {printf("mismatch at %d, was %f, should be %f\n", i, h_data[i], 10.0f); return 1;}
  printf("Success\n");
  return 0;
}


标签: cuda