I need an atomic max function for floats in OpenCL. This is my current naive code using atomic_xchg
float value = data[index];
if ( value > *max_value )
{
atomic_xchg(max_value, value);
}
This code gives the correct result when using an Intel CPU, but not for a Nvidia GPU. Is this code correct, or can anyone help me?
You can do it like this:
//Function to perform the atomic max
inline void AtomicMax(volatile __global float *source, const float operand) {
union {
unsigned int intVal;
float floatVal;
} newVal;
union {
unsigned int intVal;
float floatVal;
} prevVal;
do {
prevVal.floatVal = *source;
newVal.floatVal = max(prevVal.floatVal,operand);
} while (atomic_cmpxchg((volatile __global unsigned int *)source, prevVal.intVal, newVal.intVal) != prevVal.intVal);
}
__kernel mykern(__global float *data, __global float *max_value){
unsigned int index = get_global_id(0);
float value = data[index];
AtomicMax(max_value, value);
}
As stated in LINK.
What it does is create a union of float and int. Perform the math on the float, but compare integers when doing the atomic xchg. As long as the integers match, the operation is completed.
However, the speed decrease due to the use of these methods is very high. Use them carefully.