OpenCL select/delete points from large array

I have an array of 2M+ points (planned to be increased to 20M in due course) that I am running calculations on via OpenCL. I'd like to delete any points that fall within a random triangle geometry.

How can I do this within an OpenCL kernel process?

I can already:

identify those points that fall outside the triangle (simple point in poly algorithm in the kernel)
pass their coordinates to a global output array.

But:

an openCL global output array cannot be variable and so I initialise it to match the input array of points in terms of size
As a result, 0,0 points occur in the final output when a point falls within the triangle
The output array therefore does not result in any reduction per se.

Can the 0,0 points be deleted within the openCL context?

n.b. I am coding in OpenFrameworks, so c++ implementations are linking to .cl files

标签： opencl

3条回答

家丑人穷心不美

2楼-- · 2019-02-25 00:58

There are alternatives, all working better or worse, depending on how the data looks like. I put one below.

Deleting the identified points can also be done by registering them in a separate array per workgroup - you need to use the same atomic_inc as with Moises's answer (see my remark there about doing this at workgroup-level!!). The end-result is a list of start-points and end-points of parts that don't need to be deleted. You can then copy parts of the array those by different threads. This is less effective if you have clusters of points that need to be deleted

0人赞添加讨论(0) 举报

甜甜的少女心

3楼-- · 2019-02-25 01:00

If I understood your problem, you can do:

--> In your kernel, you can identify the points in the triangle and:

if(element[idx]!=(0,0))
      output_array[atomic_inc(number_of_elems)] = element[idx];

Finally, in first number_of_elems of output_array in the host you will have your inner points.

I hope this help you, Best

0人赞添加讨论(0) 举报

神经病院院长

4楼-- · 2019-02-25 01:07

Just an alternative for the case where most of the points fall inside the atomic condition:

It is possible to have a local counter, and local atomic. Then to merge that atomic to the global value it is possible to use atomic_add(). Witch will return the "previous" global value. So, you just copy the indexes to that address and up.

It should be a noticeable speed up, since the threads will sync locally and only once globally. The global copy can be parallel since the address will never overlap.

For example:

__kernel mykernel(__global MyType * global_out, __global int * global_count, _global MyType * global_in){
   int lid = get_local_id(0);
   int lws = get_local_size(0);
   int idx = get_global_id(0);

   __local int local_count;
   __local int global_val;    
   //I am using a local container, but a local array of pointers to global is possible as well
   __local MyType local_out[WG_SIZE]; //Ensure this is higher than your work_group size
   if(lid==0){
      local_count = 0; global_val = -1;
   }
   barrier(CLK_LOCAL_MEM_FENCE);

   //Classify them
   if(global_in[idx] == ....)
       local_out[atomic_inc(local_count)] = global_in[idx];

   barrier(CLK_LOCAL_MEM_FENCE);

   //If not, we are done
   if(local_count > 0){
      //Only the first local ID does the atomic to global
      if(lid == 0)
         global_val = atomic_add(global_count,local_count);

      //Resync all the local workers here
      barrier(CLK_LOCAL_MEM_FENCE);

      //Copy all the data
      for(int i=0; i<local_count; i+=lws)
          global_out[global_val+i] = local_out[i];
   }
}

NOTE: I didn't compile it but should more or less work.

0人赞添加讨论(0) 举报

OpenCL select/delete points from large array

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间