I was trying to understand how exactly CL_MEM_USE_HOST_PTR and CL_MEM_COPY_HOST_PTR work. Basically when using CL_MEM_USE_HOST_PTR, say in creating a 2D image, this will copy nothing to the device, instead the GPU will refer the mapped memory(clEnqueueMapBuffer maps it) on the host, do the processing and we can write the results to some other location.
On the other hand if I use the CL_MEM_COPY_HOST_PTR, it will create a copy of the data pointed to by host ptr on the device(I guess it will create a separate copy not just caching). Now the processing will be done on the data that was copied to the device and then again the results are copied to host. I hope I have understood it correctly.
So my query is... Its just out of my curiosity that I want to do it this way. I will use the CL_MEM_USE_HOST_PTR and now even though the device can access the host memory, I want the GPU kernel to create a separate copy onto the device itself(not using the COPY_HOST_PTR because this is again done in the host itself) and then do the processing on this data. How can this be done??
CL_MEM_HOST_PTR - in practice cl_mem object will allocate memory at the device and copy the data specified by the host pointer. Any modifications on the buffer object in the device side will not visible to the host side.
CL_MEM_USE_HOST_PTR - cl_mem object uses the memory refferd by the host_ptr, so the device can directrly modify the allocated on the host data in this way we don't involve any data transfer.
OpenCL buffers generally have a copy of their "bits" in host memory (this is how the contents of the buffers are called in the OpenCL spec). It is necessary because device memory is limited, and the bits are usually transferred to the device only when used by kernels.
When you create a buffer with USE_HOST_PTR, you allow the OpenCL runtime to use the host_ptr location for this host memory copy. When a kernel will use the buffer, the bits will be copied to the device. After execution, you will need to make sure the bits are synchronized back to your host memory. This is done by calling
clEnqueueMapBuffer
, and the pointer returned by this function will be inside your host memory area.When you create a buffer with COPY_HOST_PTR, the runtime allocates a new host memory copy of the buffer, and copies your bits into it. Usually, nothing is sent to the device at this point.
Create your buffer to copy to using CL_MEM_READ_WRITE, but don't initialize it on your host. I recently had to init a fresh buffer to consecutive integers
clCreateBuffer above doesn't do anything to your host's memory other than give you a handle to the memory object. I then use a kernel to assign the sequential values, because the memory speed on the graphics card proved to be much faster than assigning the values on the cpu.
There is still no copy of the buffer in host memory at this point. I would need to use clEnqueueReadBuffer to copy it to the host.
You can easily modify this code to be a copying kernel rather than just straight assignment.