The following code was written with the goal of incrementing a 100 element array of floats by 1 ten times. In the output, I was expecting a 100 element array of 10.0f value for each element. Instead, I get random values. Can you please point out my error here?
__global__ void testAdd(float *a)
{
float temp;
for (int i = 0; i < 100 ; i++)
{
a[i] = atomicAdd(&a[i], 1.0f);
}
}
void cuTestAtomicAdd(float *a)
{
testAdd<<<1, 10>>>(a);
}
My goal is to understand the workings of atomic operations, so as to apply them elsewhere.
That's not how we do an atomicAdd
operation.
Just do it like this:
atomicAdd(&a[i], 1.0f);
and the variable in question (a[i]
) will be updated.
The return value from an atomic function is generally the old value that was in the variable, before the atomic update.
so doing this:
a[i] = atomicAdd(&a[i], 1.0f);
will update the variable a[i]
, and then (non-atomically) assign the old value to the variable a[i]
. That's almost certainly not what you want.
Read the documentation:
The function returns old.
The following complete code demonstrates correct usage:
#include <iostream>
__global__ void testAdd(float *a)
{
for (int i = 0; i < 100 ; i++)
{
atomicAdd(&a[i], 1.0f);
}
}
void cuTestAtomicAdd(float *a)
{
testAdd<<<1, 10>>>(a);
}
int main(){
float *d_data, *h_data;
h_data=(float *) malloc(100*sizeof(float));
cudaMalloc((void **)&d_data, 100*sizeof(float));
cudaMemset(d_data, 0, 100*sizeof(float));
cuTestAtomicAdd(d_data);
cudaMemcpy(h_data, d_data, 100*sizeof(float), cudaMemcpyDeviceToHost);
for (int i = 0; i < 100; i++)
if (h_data[i] != 10.0f) {printf("mismatch at %d, was %f, should be %f\n", i, h_data[i], 10.0f); return 1;}
printf("Success\n");
return 0;
}