I would like to ask to you guys if there is a better way to combine 2 atomics.
My goal is to find the highest results for a set of K equations (more than 32) under a list of J parameters values (very similar a 2-way input) and to save the value and the j index.
if (atomicMax(&max_k[id], t_max) < t_max) atomicExch(&indexMax[id],t_pos);
Initially we used the approach described above, but, since we do expect an even higher value for every thread, it is possible to have in the same warp the thread B > C > A (thread B has the highest value and thread C has a value higher than A). I'm not sure but the atomicExch can be executed in another thread order than the atomicMax was (is that correct?), so we tried a critical section, but it led to a deadlock. After all the solutions bellow seems to work.
Is there a better way or there is any issue in the following code?
__device__ int atomicMaxCAS(int* addressMax, int valMax, int* addressCAS, int valCas) {
int oldCas = *addressCAS, assumedCas;
int oldMax = *addressMax, assumedMax;
do {
assumedCas = oldCas;
assumedMax = oldMax;
oldMax = atomicMax(addressMax, valMax);
if (oldMax < valMax) oldCas = atomicCAS(addressCAS, assumedCas, valCas);
} while (assumedCas != oldCas || assumedMax != oldMax);
return (oldMax);
}
Thanks in advance! I was able to start writing CUDA due to all this posts about!