Note: I'm self-learning Vulkan with little knowledge of modern OpenGL.
Reading the Vulkan specifications, I can see very nice semaphores that allow the command buffer and the swapchain to synchronize. Here's what I understand to be a simple (yet I think inefficient) way of doing things:
- Get image with
vkAcquireNextImageKHR
, signallingsem_post_acq
- Build command buffer (or use pre-built) with:
- Image barrier to transition image away from
VK_IMAGE_LAYOUT_UNDEFINED
- render
- Image barrier to transition image to
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
- Image barrier to transition image away from
- Submit to queue, waiting on
sem_post_acq
on fragment stage and signallingsem_pre_present
. vkQueuePresentKHR
waiting onsem_pre_present
.
The problem here is that the image barriers in the command buffer must know which image they are transitioning, which means that vkAcquireNextImageKHR
must return before one knows how to build the command buffer (or which pre-built command buffer to submit). But vkAcquireNextImageKHR
could potentially sleep a lot (because the presentation engine is busy and there are no free images). On the other hand, the submission of the command buffer is costly itself, and more importantly, all stages before fragment can run without having any knowledge of which image the final result will be rendered to.
Theoretically, it seems to me that a scheme like the following would allow a higher degree of parallelism:
- Build command buffer (or use pre-built) with:
- Image barrier to transition image away from
VK_IMAGE_LAYOUT_UNDEFINED
- render
- Image barrier to transition image to
VK_IMAGE_LAYOUT_PRESENT_SRC_KHR
- Image barrier to transition image away from
- Submit to queue, waiting on
sem_post_acq
on fragment stage and signallingsem_pre_present
. - Get image with
vkAcquireNextImageKHR
, signallingsem_post_acq
vkQueuePresentKHR
waiting onsem_pre_present
.
Which would, again theoretically, allow the pipeline to execute all the way up to the fragment shader, while we wait for vkAcquireNextImageKHR
. The only reason this doesn't work is that it is neither possible to tell the command buffer that this image will be determined later (with proper synchronization), nor is it possible to ask the presentation engine for a specific image.
My first question is: is my analysis correct? If so, is such an optimization not possible in Vulkan at all and why not?
My second question is: wouldn't it have made more sense if you could tell vkAcquireNextImageKHR
which particular image you want to acquire, and iterate through them yourself? That way, you could know in advance which image you are going to ask for, and build and submit your command buffer accordingly.
Your entire question is predicated on the assumption that you cannot do any command buffer building work without a specific swapchain image. That's not true at all.
First, you can always build secondary command buffers; providing a
VkFramebuffer
is merely a courtesy, not a requirement. And this is very important if you want to use Vulkan to improve CPU performance. After all, being able to build command buffers in parallel is one of the selling points of Vulkan. For you to only be creating one is something of a waste for a performance-conscious application.In such a case, only the primary command buffer needs the actual image.
Second, who says that you will be doing the majority of your rendering to the presentable image? If you're doing deferred rendering, most of your stuff will be written to deferred buffers. Even post-processing effects like tone-mapping, SSAO, and so forth will probably be done to an intermediate buffer.
Worst-case scenario, you can always render to your own image. Then you build a command buffer who's only contents is an image copy from your image to the presentable one.
You assume that the hardware has a strict separation between vertex processing and rasterization. This is true only for tile-based hardware.
Direct renderers just execute the whole pipeline, top to bottom, for each rendering command. They don't store post-transformed vertex data in large buffers. It just flows down to the next step. So if the "fragment stage" has to wait on a semaphore, then you can assume that all other stages will be idle as well while waiting.
No. The implementation would be unable to decide which image to give you next. This is precisely why you have to ask for an image: so that the implementation can figure out on its own which image it is safe for you to have.
Also, there's specific language in the specification that the semaphore and/or event you provide must not only be unsignaled but there cannot be any outstanding operations waiting on them. Why?
Because
vkAcquireNextImageKHR
can fail. If you have some operation in a queue that's waiting on a semaphore that's never going to fire, that will cause huge problems. You have to successfully acquire first, then submit work that is based on the semaphore.Generally speaking, if you're regularly having trouble getting presentable images in a timely fashion, you need to make your swapchain longer. That's the point of having multiple buffers, after all.
Like Nicol said you can record secondaries independent of which image it will be rendering to.
However you can take it a step further and record command buffers for all swpachain images in advance and select the correct one to submit from the image acquired.
This type of reuse does take some extra consideration into account because all memory ranges used are baked into the command buffer. But in many situations the required render commands don't actually change frame one frame to the next, only a little bit of the data used.
So the sequence of such a frame would be:
Granted this does not allow for conditional rendering but the gpu is in general fast enough to allow some geometry to be rendered out of frame without any noticeable delays. So until the player reaches a loading zone where new geometry has to be displayed you can keep those command buffers alive.