Can I use Quadro K4000 and K2000 for GPUDirect v2

2019-05-10 02:03发布

I use:

  • Single CPU (Intel Core i7-4820K Ivy Bridge-E) 40 Lanes of PCIe 3.0 + MotherBoard MSI X79A-GD65 (8D)
  • WindowsServer 2012, MSVS 2012 + CUDA 5.5 and compiled as 64-bit application
  • GPUs nVidia Quadro K4000 and K2000
  • All Quadros in TCC-mode (Tesla Compute Cluster)
  • nVidia Video Driver 332.50

simpleP2P-test shown that, all Quadros K4000 and K4000 - IS capable of Peer-to-Peer (P2P), but Peer-to-Peer (P2P) access - Quadro K4000 (GPU0) <-> Quadro K2000 (GPU1) : No.

Can I use Quadro K4000 and K2000 for GPUDirect v2 Peer-to-peer (P2P) communication?

[C:\ProgramData\NVIDIA Corporation\CUDA Samples\v5.5\0_Simple\simpleP2P../../bi n/win64/Release/simpleP2P.exe] - Starting... Checking for multiple GPUs... CUDA-capable device count: 3

GPU0 = " Quadro K4000" IS capable of Peer-to-Peer (P2P)

GPU1 = " Quadro K2000" IS capable of Peer-to-Peer (P2P)

GPU2 = " GeForce GT 640" NOT capable of Peer-to-Peer (P2P)

Checking GPU(s) for support of peer to peer memory access...

Peer-to-Peer (P2P) access from Quadro K4000 (GPU0) -> Quadro K2000 (GPU1) : No

Peer-to-Peer (P2P) access from Quadro K2000 (GPU1) -> Quadro K4000 (GPU0) : No

Two or more SM 2.0 class GPUs are required for C:\ProgramData\NVIDIA Corporation \CUDA Samples\v5.5\0_Simple\simpleP2P../../bin/win64/Release/simpleP2P.exe to r un. Support for UVA requires a GPU with SM 2.0 capabilities. Peer to Peer access is not available between GPU0 <-> GPU1, waiving test.

Quadros in TCC-mode:

nvidia-smi.exe"
Tue Mar 11 12:43:05 2014
+------------------------------------------------------+
| NVIDIA-SMI 5.320.57   Driver Version: 320.57         |
|-------------------------------+----------------------+----------------------+
| GPU  Name            TCC/WDDM | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Quadro K2000        TCC  | 0000:01:00.0     Off |                  N/A |
| 30%   30C    P8    N/A /  N/A |        6MB /  2047MB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GT 640     WDDM  | 0000:02:00.0     N/A |                  N/A |
| 40%   32C  N/A     N/A /  N/A |     2016MB /  2047MB |     N/A      Default |
+-------------------------------+----------------------+----------------------+
|   2  Quadro K4000        TCC  | 0000:03:00.0     Off |                  N/A |
| 30%   36C    P8    10W /  87W |        8MB /  3071MB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Compute processes:                                               GPU Memory |
|  GPU       PID  Process name                                     Usage      |
|=============================================================================|
|    1            Not Supported                                               |

In the documentation said that: https://developer.nvidia.com/gpudirect

GPUDirect eliminates unnecessary system memory copies, dramatically lowers CPU overhead, and reduces latency, resulting in significant performance improvements in data transfer times for applications running on NVIDIA Tesla™ and Quadro™ products.

More detailed specifications of Quadros there, but there are only about GPUDirect For Video, and nothing about P2P: http://www.nvidia.com/content/PDF/line_card/6660-nv-prographicssolutions-linecard-july13-final-lr.pdf

About PCIe bus:

nvidia-smi -q
GPU 0000:01:00.0
    Product Name                    : Quadro K2000
    PCI
        Bus                         : 0x01
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x0FFE10DE
        Bus Id                      : 0000:01:00.0
        Sub System Id               : 0x094C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 8x
    FB Memory Usage
        Total                       : 2047 MiB
        Used                        : 6 MiB
        Free                        : 2041 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default
...

GPU 0000:02:00.0
    Product Name                    : GeForce GT 640
    PCI
        Bus                         : 0x02
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x0FC110DE
        Bus Id                      : 0000:02:00.0
        Sub System Id               : 0x8A921462
        GPU Link Info
            PCIe Generation
                Max                 : N/A
                Current             : N/A
            Link Width
                Max                 : N/A
                Current             : N/A

...

GPU 0000:03:00.0
    Product Name                    : Quadro K4000
    PCI
        Bus                         : 0x03
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x11FA10DE
        Bus Id                      : 0000:03:00.0
        Sub System Id               : 0x097C10DE
        GPU Link Info
            PCIe Generation
                Max                 : 2
                Current             : 1
            Link Width
                Max                 : 16x
                Current             : 16x
    FB Memory Usage
        Total                       : 3071 MiB
        Used                        : 8 MiB
        Free                        : 3063 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 2 MiB
        Free                        : 254 MiB
    Compute Mode                    : Default

Can I use GPUDirect v2 P2P with Quadros, and if I can, then in which of these? Should be size of the BAR1 is equal to the size of GPU-RAM to be able to use P2P?

UPDATE 11.03.2014 23:16:

  1. I can't use P2P Direct Transfers - I transfered random generated data by using cudaMemcpy(gpu_ptr1, gpu_ptr0, cudaMemcpyDefault); successfully with 3 GB/sec on PCIe-gen2 8x (4 GB/sec theoretically), but function copies through the host - In VisualProfiler Context1(DtoH) and Context2(HtoD).
  2. I can't use P2P Direct Access by using __global__ Kernel(char *dst, char *src, size_t size) { int idx = blockIdx.x * blockDim.x + threadIdx.x; dst[idx] = src[idx]; } - I get an error when use function cudaDeviceEnablePeerAccess() and get 0 when using cudaDeviceCanAccessPeer()

1条回答
劫难
2楼-- · 2019-05-10 02:50

I don't know if it's related with your problem, but note this:

    GPU Link Info
        PCIe Generation
            Max                 : 2
            Current             : 1
        Link Width
            Max                 : 16x
            Current             : 8x

and this:

        PCIe Generation
            Max                 : 2
            Current             : 1
        Link Width
            Max                 : 16x
            Current             : 16x

that is, your PCIe links have been demoted from 2.0 (5 GT/s) to 1.0 (2.5 GT/s) and on one card from 16x to 8x... it's very possible that this is a problem for GPU direct too, but for sure it's not what you want, in order to squeeze all the performance from your PCIe (on one card you're getting 25% of the theoretical, 50% on the other one).

I have found that it's important the order where the card are put on the mothorboard; overheating can lead to downgrade of the buses too, or dust... planets alignment too probably....

EDIT: I didn't know that TCC was mandatory for GPU direct to work, so the following is not valid.

First of all I'd try to remove the display card and see if using only the two quadro cards you get all PCIe 2.0/16x, and whether in this case GPU direct starts working or not.

EDIT: from your additional information: "and because in motherboard the monitor must be connected to the card in first slot (which with 16 PCIe-Lanes), then I have: 16x-GeForce, 16x-Quadro K4000 and 8x-Quadro K2000"

Well fortunately it's not true (or at least, is not what is reported in the manual of your motherboad):

lower part of pag 18 of the manual of the motherboard

So the correct place to attach the monitor to is to slot PCI_E6, the 8x one... good luck swapping cards.

Congrats for your question being so precise - that's helped a lot (note - still don't know if it solve... keep us informed!).

查看更多
登录 后发表回答