Is two (or more) different threads allowed to write to the same memory location in global space in OpenCL? The write is always changing a uchar from 0 to 1 so the outcome should be predictable, but I'm getting erratic results in my program, so I'm wondering if the reason can be that some of the writes fail.
Could it help to declare the buffer write-only and copy it to a read-only buffer afterwards?
Did you try to use the
cl_khr_global_int32_base_atomics
extension andatom_inc
intrinsic function? I would first store the data on anint32
instead of anuchar
as proof of concept, then optimize the memory footprint of data structures.