I'd like to implement this atomic function in CUDA:
__device__ float lowest; // global var
__device__ int lowIdx; // global var
float realNum; // thread reg var
int index; // thread reg var
if(realNum < lowest) {
lowest= realNum; // the new lowest
lowIdx= index; // update the 'low' index
}
I don't believe I can do this with any of the atomic functions. I need to lock down a couple global memory loc's for a couple instructions. Might I be able to implement this with PTXAS (assembly) code?
@Robert Crovella: Excellent idea, but I think the function should be modified a little bit as follows:
As I stated in my second comment above, it's possible to combine your two 32-bit quantities into a single 64-bit atomically managed quantity, and deal with the problem that way. We then manage the 64-bit quantity atomically using the arbitrary atomic example as a rough guide. Obviously you can't extend this idea beyond two 32-bit quantities. Here's an example: