atomicInc() is not working

2019-08-25 06:17发布

问题:

I have tried below program using atomicInc().

__global__ void ker(int *count)
{
    int n=1;
    int x = atomicInc ((unsigned int *)&count[0],n);
    CUPRINTF("In kernel count is %d\n",count[0]);
}

int main()
{
    int hitCount[1];
    int *hitCount_d;

    hitCount[0]=1;
    cudaMalloc((void **)&hitCount_d,1*sizeof(int));

    cudaMemcpy(&hitCount_d[0],&hitCount[0],1*sizeof(int),cudaMemcpyHostToDevice);

    ker<<<1,4>>>(hitCount_d);

    cudaMemcpy(&hitCount[0],&hitCount_d[0],1*sizeof(int),cudaMemcpyDeviceToHost);

    printf("count is %d\n",hitCount[0]);
  return 0;
}

Output is:

In kernel count is 1
In kernel count is 1
In kernel count is 1
In kernel count is 1

count is 1

I'm not understanding why it is not incrementing. Can anyone help

回答1:

Referring to the documentation, atomicInc does this:

for the following:

atomicInc ((unsigned int *)&count[0],n);

compute:

((count[0] >= n) ? 0 : (count[0]+1))

and store the result back in count[0]

(If you're not sure what the ? operator does, look here)

Since you've passed n = 1, and count[0] starts out at 1, atomicInc never actually increments the variable count[0] beyond 1.

If you want to see it increment beyond 1, pass a larger value for n.

The variable n actually acts as a "rollover value" for the incrementing process. When the variable to be incremented actually reaches the value of n, the next atomicInc will reset it to zero.

Although you haven't asked the question, you might ask, "Why do I never see a value of zero, if I am hitting the rollover value?"

To answer this, you must remember that all 4 of your threads are executing in lockstep. All 4 of them execute the atomicInc instruction before any execute the subsequent print statement.

Therefore we have a variable of count[0] which starts out at 1.

  1. The first thread to execute the atomic resets it to zero.
  2. The next thread increments it to 1.
  3. The third thread resets it to zero.
  4. The fourth and final thread increments it to 1.

Then all 4 threads print out the value.

As another experiment, try launching 5 threads instead of 4, see if you can predict what the value printed out will be.

ker<<<1,5>>>(hitCount_d);

As @talonmies indicated in the comments, if you swap your atomicInc for an atomicAdd:

int x = atomicAdd ((unsigned int *)&count[0],n);

You'll get results that you were probably expecting.