Impossible to acquire and present in parallel with

2019-07-07 17:35发布

Note: I'm self-learning Vulkan with little knowledge of modern OpenGL.

Reading the Vulkan specifications, I can see very nice semaphores that allow the command buffer and the swapchain to synchronize. Here's what I understand to be a simple (yet I think inefficient) way of doing things:

  1. Get image with vkAcquireNextImageKHR, signalling sem_post_acq
  2. Build command buffer (or use pre-built) with:
    • Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
    • render
    • Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
  3. Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
  4. vkQueuePresentKHR waiting on sem_pre_present.

The problem here is that the image barriers in the command buffer must know which image they are transitioning, which means that vkAcquireNextImageKHR must return before one knows how to build the command buffer (or which pre-built command buffer to submit). But vkAcquireNextImageKHR could potentially sleep a lot (because the presentation engine is busy and there are no free images). On the other hand, the submission of the command buffer is costly itself, and more importantly, all stages before fragment can run without having any knowledge of which image the final result will be rendered to.

Theoretically, it seems to me that a scheme like the following would allow a higher degree of parallelism:

  1. Build command buffer (or use pre-built) with:
    • Image barrier to transition image away from VK_IMAGE_LAYOUT_UNDEFINED
    • render
    • Image barrier to transition image to VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
  2. Submit to queue, waiting on sem_post_acq on fragment stage and signalling sem_pre_present.
  3. Get image with vkAcquireNextImageKHR, signalling sem_post_acq
  4. vkQueuePresentKHR waiting on sem_pre_present.

Which would, again theoretically, allow the pipeline to execute all the way up to the fragment shader, while we wait for vkAcquireNextImageKHR. The only reason this doesn't work is that it is neither possible to tell the command buffer that this image will be determined later (with proper synchronization), nor is it possible to ask the presentation engine for a specific image.


My first question is: is my analysis correct? If so, is such an optimization not possible in Vulkan at all and why not?


My second question is: wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself? That way, you could know in advance which image you are going to ask for, and build and submit your command buffer accordingly.

标签: vulkan
2条回答
唯我独甜
2楼-- · 2019-07-07 17:53

Your entire question is predicated on the assumption that you cannot do any command buffer building work without a specific swapchain image. That's not true at all.

First, you can always build secondary command buffers; providing a VkFramebuffer is merely a courtesy, not a requirement. And this is very important if you want to use Vulkan to improve CPU performance. After all, being able to build command buffers in parallel is one of the selling points of Vulkan. For you to only be creating one is something of a waste for a performance-conscious application.

In such a case, only the primary command buffer needs the actual image.

Second, who says that you will be doing the majority of your rendering to the presentable image? If you're doing deferred rendering, most of your stuff will be written to deferred buffers. Even post-processing effects like tone-mapping, SSAO, and so forth will probably be done to an intermediate buffer.

Worst-case scenario, you can always render to your own image. Then you build a command buffer who's only contents is an image copy from your image to the presentable one.

all stages before fragment can run without having any knowledge of which image the final result will be rendered to.

You assume that the hardware has a strict separation between vertex processing and rasterization. This is true only for tile-based hardware.

Direct renderers just execute the whole pipeline, top to bottom, for each rendering command. They don't store post-transformed vertex data in large buffers. It just flows down to the next step. So if the "fragment stage" has to wait on a semaphore, then you can assume that all other stages will be idle as well while waiting.

wouldn't it have made more sense if you could tell vkAcquireNextImageKHR which particular image you want to acquire, and iterate through them yourself?

No. The implementation would be unable to decide which image to give you next. This is precisely why you have to ask for an image: so that the implementation can figure out on its own which image it is safe for you to have.

Also, there's specific language in the specification that the semaphore and/or event you provide must not only be unsignaled but there cannot be any outstanding operations waiting on them. Why?

Because vkAcquireNextImageKHR can fail. If you have some operation in a queue that's waiting on a semaphore that's never going to fire, that will cause huge problems. You have to successfully acquire first, then submit work that is based on the semaphore.

Generally speaking, if you're regularly having trouble getting presentable images in a timely fashion, you need to make your swapchain longer. That's the point of having multiple buffers, after all.

查看更多
Bombasti
3楼-- · 2019-07-07 18:18

Like Nicol said you can record secondaries independent of which image it will be rendering to.

However you can take it a step further and record command buffers for all swpachain images in advance and select the correct one to submit from the image acquired.

This type of reuse does take some extra consideration into account because all memory ranges used are baked into the command buffer. But in many situations the required render commands don't actually change frame one frame to the next, only a little bit of the data used.

So the sequence of such a frame would be:

vkAcquireNextImageKHR(vk.dev, vk.swap, 0, vk.acquire, VK_NULL_HANDLE, &vk.image_ind);
vkWaitForFences(vk.dev, 1, &vk.fences[vk.image_ind], true, ~0);

engine_update_render_data(vk.mapped_staging[vk.image_ind]);

VkSubmitInfo submit = build_submit(vk.acquire, vk.rend_cmd[vk.image_ind], vk.present);
vkQueueSubmit(vk.rend_queue, 1, &submit, vk.fences[vk.image_ind]);

VkPresentInfoKHR present = build_present(vk.present, vk.swap, vk.image_ind);
vkQueuePresentKHR(vk.queue, &present);

Granted this does not allow for conditional rendering but the gpu is in general fast enough to allow some geometry to be rendered out of frame without any noticeable delays. So until the player reaches a loading zone where new geometry has to be displayed you can keep those command buffers alive.

查看更多
登录 后发表回答