CUDA blocking flags

When creating a CUDA event, you can optionally turn on the cudaEventBlockingSync flag. But - what if the difference between creating an event with or without the flag? I read the fine manual; it just doesn't make sense to me. What is the "calling host thread", and what "blocks" when you don't use the flag?

4.6.2.7 cudaError_t cudaEventSynchronize(cudaEvent_t event)

Blocks until the event has actually been recorded. ... Waiting for an event that was created with the cudaEventBlockingSync flag will cause the calling host thread to block until the event has actually been recorded.

标签： synchronization cuda

2条回答

冷血范

2楼-- · 2019-07-04 04:04

cudaEventBlockingSync will define how the host will wait for the event to happen.

When cudaEventBlockingSync is SET the CPU can give up the host thread. i.e. The CPU will be passed a different thread (possibly of a process). The host thread will re-acquire the CPU at a later time. With this approach, the host thread does not monopolize all the CPU time, the host can be allowed to do other work.

When cudaEventBlockingSync is NOT SET the CPU will busy-wait, i.e. the CPU will enter a check-event loop. When this happens the CPU just spins, looking for the event to occur. This usually causes the CPU performance meter to peg-out to 100%. With this approach, the host thread monopolizes all the CPU time.

Not setting cudaEventBlockingSync results in the minimum latency from kernel execution conclusion to the control returning to the thread. Which setting you want to use depends on what the kernel is doing. i.e. How long will it take for the event to happen, versus, how much schedule overhead is involved with the CPU blocking. Not setting this flag comes at the cost of not being able to do any other CPU work (other threads) while waiting for the event to occur.

0人赞添加讨论(0) 举报

Summer. ? 凉城

3楼-- · 2019-07-04 04:06

when you call that function, the thread will stop executing until that event happens, at which time the program continues. It is a way of making sure you know the state of the running program. This is especially important in CUDA because so many things are asynchronous.

The "calling host thread" is the thread that is running on the CPU of the host computer in which the CUDA device resides.

edit in response to comment below:

I believe that the difference between a "blocking sync" and a regular sync is that the thread blocks and will not run until the event is completed, as opposed to a thread that "spins" as it waits, constantly checking the value. This means that the thread will not use any extra CPU time spinning, but will instead be awakened once the event is completed. This is useful if, say, you're running this program on a server where CPU time is at a premium or you have to pay per unit time.

0人赞添加讨论(0) 举报

CUDA blocking flags

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间