Looking at kernel launches within the code of CUDA Thrust, it seems they always use the default stream. Can I make Thrust use a stream of my choice? Am I missing something in the API?
相关问题
- Achieving the equivalent of a variable-length (loc
- The behavior of __CUDA_ARCH__ macro
- Setting Nsight to run with existing Makefile proje
- Usage of anonymous functions in arrayfun with GPU
- Does CUDA allow multiple applications on same gpu
相关文章
- How to downgrade to cuda 10.0 in arch linux?
- What's the relation between nvidia driver, cud
- How can I use 100% of VRAM on a secondary GPU from
- NVidia CUDA toolkit 7.5.27 failing to install on O
- How can I find row to all rows distance matrix bet
- thrust: fill isolate space
- How to get the real and imaginary parts of a compl
- Matrix Transpose (with shared Memory) with arbitar
I want to update the answer provided by talonmies following the release of Thrust 1.8 which introduces the possibility of indicating the CUDA execution stream as
see also
Thrust Release 1.8.0.
In the following, I'm recasting the example in
False dependency issue for the Fermi architecture
in terms of CUDA Thrust APIs.
The Utilities.cu and Utilities.cuh files needed to run such an example are maintained at this github page.
The Visual Profiler timeline shows the concurrency of CUDA Thrust operations and memory transfers
No you are not missing anything (at least up to the release snapshot which ships with CUDA 6.0).
The original Thrust tag based dispatch system deliberately abstracts all of the underlying CUDA API calls away, sacrificing some performance for ease of use and consistency (keep in mind that thrust has backends other than CUDA). If you want that level of flexibility, you will need to try another library (CUB, for example).
In versions since the CUDA 7.0 snapshot it has become possible to set a stream of choice for thrust operations via the execution policy and dispatch feature.