Why it is not possible to overlap memHtoD with GPU

2019-07-30 13:44发布

问题:

I tested my GTX590 and GTX680 with cudaSDK "simpleStreams". The timeline results are shown as the pictures. Anyone to explain why in GTX 590 memC!pyDtoH cannot overlap with previous kernel computation which happens in GTX 680?

回答1:

I get similar behavior with my GTX 480. I suspect something is wrong with Fermi ? maybe related to wddm? (using Windows 7 x64 here)

I have tried many many different drivers and all of them show the same wrong behavior. You know have tested GK104 proven right and I have already tested it on an old 8800 GTS and it indeed works. It seems the fermi cards doesnt work :/

edit:

see this also How can I overlap memory transfers and kernel execution in a CUDA application?