Recently I started to develop on CUDA and faced with the problem with atomicCAS(). To do some manipulations with memory in device code I have to create a mutex, so that only one thread could work with memory in critical section of code.
The device code below runs on 1 block and several threads.
__global__ void cudaKernelGenerateRandomGraph(..., int* mutex)
{
int i = threadIdx.x;
...
do
{
atomicCAS(mutex, 0, 1 + i);
}
while (*mutex != i + 1);
//critical section
//do some manipulations with objects in device memory
*mutex = 0;
...
}
When first thread executes
atomicCAS(mutex, 0, 1 + i);
mutex is 1. After that first thread changes its status from Active to Inactive, and line
*mutex = 0;
is not executed. Other threads stays forever in loop. I have tried many variants of this cycle like while(){};, do{}while();, with temp variable = *mutex inside loop, even variant with if(){} and goto. But result is the same.
The host part of code:
...
int verticlesCount = 5;
int *mutex;
cudaMalloc((void **)&mutex, sizeof(int));
cudaMemset(mutex, 0, sizeof(int));
cudaKernelGenerateRandomGraph<<<1, verticlesCount>>>(..., mutex);
I use Visual Studio 2012 with CUDA 5.5.
The device is NVidia GeForce GT 240 with compute capability 1.2.
Thanks in advance.
UPD: After some time working on my diploma project this spring, I found a solution for critical section on cuda. This is a combination of lock-free and mutex mechanisms. Here is working code. Used it to impelment atomic dynamic-resizable array.
// *mutex should be 0 before calling this function
__global__ void kernelFunction(..., unsigned long long* mutex)
{
bool isSet = false;
do
{
if (isSet = atomicCAS(mutex, 0, 1) == 0)
{
// critical section goes here
}
if (isSet)
{
mutex = 0;
}
}
while (!isSet);
}