Inter-block synchronization in CUDA

2019-06-14 11:18发布

I've searched a month for this problem. I cannot synchronize blocks in CUDA.

I've read a lot of posts about atomicAdd, cooperative groups, etc. I decided to use an global array so a block could write on one element of global array. After this writing, a thread of block waits(i.e. trapped in a while loop) until all blocks write global array.

When I used 3 blocks my synchronization works well (because I have 3 SM). But using 3 blocks gives me 12% occupancy. So I need to use more blocks, but they can't be synchronized. The problem is: a block on a SM waits for other blocks, so the SM can't get another block.

What can I do? How can synchronize blocks when there are blocks more than the number of SMs?

CUDA-GPU specification: CC. 6.1, 3 SM, windows 10, VS2015, GeForce MX150 graphic card. Please help me for this problem. I used a lot of codes but none of them works.

标签： parallel-processing cuda nvidia gpu-programming

1条回答

We Are One

2楼-- · 2019-06-14 12:00

The CUDA programming model methods to do inter-block synchronization are

(implicit) Use the kernel launch itself. Before the kernel launch, or after it completes, all blocks (in the launched kernel) are synchronized to a known state. This is conceptually true whether the kernel is launched from host code or as part of CUDA Dynamic Parallelism launch.
(explicit) Use a grid sync in CUDA Cooperative groups. This has a variety of requirements for support, which you are starting to explore in your other question. The simplest definition of support is if the appropriate property is set (cooperativeLaunch). You can query the property programmatically, using cudaGetDeviceProperties().

0人赞添加讨论(0) 举报

Inter-block synchronization in CUDA

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间