I have tried below program using atomicInc().
__global__ void ker(int *count)
{
int n=1;
int x = atomicInc ((unsigned int *)&count[0],n);
CUPRINTF("In kernel count is %d\n",count[0]);
}
int main()
{
int hitCount[1];
int *hitCount_d;
hitCount[0]=1;
cudaMalloc((void **)&hitCount_d,1*sizeof(int));
cudaMemcpy(&hitCount_d[0],&hitCount[0],1*sizeof(int),cudaMemcpyHostToDevice);
ker<<<1,4>>>(hitCount_d);
cudaMemcpy(&hitCount[0],&hitCount_d[0],1*sizeof(int),cudaMemcpyDeviceToHost);
printf("count is %d\n",hitCount[0]);
return 0;
}
Output is:
In kernel count is 1
In kernel count is 1
In kernel count is 1
In kernel count is 1
count is 1
I'm not understanding why it is not incrementing. Can anyone help
Referring to the documentation, atomicInc
does this:
for the following:
atomicInc ((unsigned int *)&count[0],n);
compute:
((count[0] >= n) ? 0 : (count[0]+1))
and store the result back in count[0]
(If you're not sure what the ?
operator does, look here)
Since you've passed n
= 1, and count[0]
starts out at 1, atomicInc
never actually increments the variable count[0]
beyond 1.
If you want to see it increment beyond 1, pass a larger value for n
.
The variable n
actually acts as a "rollover value" for the incrementing process. When the variable to be incremented actually reaches the value of n
, the next atomicInc
will reset it to zero.
Although you haven't asked the question, you might ask, "Why do I never see a value of zero, if I am hitting the rollover value?"
To answer this, you must remember that all 4 of your threads are executing in lockstep. All 4 of them execute the atomicInc
instruction before any execute the subsequent print statement.
Therefore we have a variable of count[0]
which starts out at 1.
- The first thread to execute the atomic resets it to zero.
- The next thread increments it to 1.
- The third thread resets it to zero.
- The fourth and final thread increments it to 1.
Then all 4 threads print out the value.
As another experiment, try launching 5 threads instead of 4, see if you can predict what the value printed out will be.
ker<<<1,5>>>(hitCount_d);
As @talonmies indicated in the comments, if you swap your atomicInc
for an atomicAdd
:
int x = atomicAdd ((unsigned int *)&count[0],n);
You'll get results that you were probably expecting.