I'm doing a reduction (finding the minimum and maximum) of a float[]
array on a GPU through OpenCL.
I'm loading the some elements from global
memory into local
memory for each workgroup. When the global size isn't a multiple of the workgroup size, I pad the global size, such that it becomes a multiple of the global size. Work-items past the end of the array put the neutral element of the reduction into local
memory.
But what should that neutral element be for max()
-- the maximum function?
The OpenCL documentation gives MAXFLOAT
, HUGE_VALF
and INFINITY
as very large positive (or unsigned) float
values.
Does it makes sense to have the neutral element to be -INFINITY
for example?
Right now I'm using HUGE_VALF
as the neutral element for min()
, but the docs also say that HUGE_VALF
is used as an error value, so maybe that's a bad idea.
Reduction kernel (Code):
#define NEUTRAL_ELEMENT HUGE_VALF
#define REDUCTION_OP min
__kernel void reduce(__global float* weights,
__local float* weights_cached
)
{
unsigned int id = get_global_id(0);
// Load data
if (id < {{ point_count }}) {
weights_cached[get_local_id(0)] = weights[id];
} else {
weights_cached[get_local_id(0)] = NEUTRAL_ELEMENT;
}
barrier(CLK_LOCAL_MEM_FENCE);
// Reduce
for(unsigned int stride = get_local_size(0) / 2; stride >= 1; stride /= 2) {
if (get_local_id(0) < stride) {
weights_cached[get_local_id(0)] = REDUCTION_OP(weights_cached[get_local_id(0)], weights_cached[get_local_id(0) + stride]);
barrier(CLK_LOCAL_MEM_FENCE);
}
// Save
weights[get_group_id(0)] = weights_cached[0];
}
Edit:
I actually ended up using fmin()
and fmax()
together with NAN
as the neutral element -- this is basically guaranteed to work according to the OpenCL documentation as the numerical value will always be returned (NAN
is only returned if two NAN
values are given).