Currently I'm work with two gtx 650 . My program resembles in simple Clients/Server structure. I distribute the work threads on the two gpus. The Server thread need to gather the result vectors from client threads, so I need to copy the memory between the two gpu. Unfortunaly, the simple P2P program in cuda samples just doesn't work because my cards don't have TCC drivers. Spending two hours searching on google and SO, I can't find the answer.Some source says I should use cudaMemcpyPeer
, and some other source says I should use cudaMemcpy
with cudaMemcpyDefault
.Is there some simple way to get my work done other than copy to host then copy to device. I know it must have been documented somewhere, but I can't find it.Thank you for your help.
相关问题
- Achieving the equivalent of a variable-length (loc
- The behavior of __CUDA_ARCH__ macro
- Setting Nsight to run with existing Makefile proje
- Usage of anonymous functions in arrayfun with GPU
- Does CUDA allow multiple applications on same gpu
相关文章
- How to downgrade to cuda 10.0 in arch linux?
- What's the relation between nvidia driver, cud
- How can I use 100% of VRAM on a secondary GPU from
- NVidia CUDA toolkit 7.5.27 failing to install on O
- How can I find row to all rows distance matrix bet
- thrust: fill isolate space
- How to get the real and imaginary parts of a compl
- Matrix Transpose (with shared Memory) with arbitar
Transferring data from one GPU to another will often require a "staging" through host memory. The exception to this is when the GPUs and the system topology support peer-to-peer (P2P) access and P2P has been explicitly enabled. In that case, data transfers can flow directly over the PCIE bus from one GPU to another.
In either case (with or without P2P being available/enabled) the typical cuda runtime API call would be
cudaMemcpyPeer
/cudaMemcpyPeerAsync
as demonstrated in the cuda p2pBandwidthLatencyTest sample code.On windows, one of the requirements of P2P is that both devices be supported by a driver in TCC mode. TCC mode is, for the most part, not an available option for GeForce GPUs (recently, an exception is made for GeForce Titan family GPUs using drivers and runtime available in the CUDA 7.5RC toolkit.)
Therefore, on Windows, these GPUs will not be able to take advantage of direct P2P transfers. Nevertheless, a nearly identical sequence can be used to transfer data. The CUDA runtime will detect the nature of the transfer, and perform an allocation "under the hood" to create a staging buffer. The transfer will then be completed in 2 parts: a transfer from the originating device to the staging buffer, and a transfer from the staging buffer to the destination device.
The following is a fully worked example showing how to transfer data from one GPU to another, while taking advantage of P2P access if it is available:
Notes: