I'm writing a ebook reader app for Windows Store. I'm using Direct2D + DXGI swap chains to render book pages on screen.
My book content sometimes is quite complex (geometry, bitmaps, masks, etc), so it can take up to 100 ms to render it. So I'm trying to do an off-screen rendering to a bitmap in a separate thread, and then just show this bitmap in main thread.
However, I can't figure how to do it efficiently.
So far I've tried two approaches:
Use a single
ID2D1Factory
with D2D1_FACTORY_TYPE_MULTI_THREADED flag, createID2D1BitmapRenderTarget
and use it in background thread for off-screen rendering. (This additionally requiresID2D1Multithread::Enter/Leave
onIDXGISwapChain::Present
operations). Problem is,ID2D1RenderTarget::EndDraw
operation in background thread sometimes take up to 100ms, and main thread rendering is blocked for this period due to internal Direct2D locking.Use a separate
ID2D1Factory
in background thread (as described in http://www.sdknews.com/ios/using-direct2d-for-server-side-rendering) and turn off internal Direct2D synchronization. There is no cross-locking betwen two threads in this case. Unfortunately, in this case I can't use resulting bitmap in mainID2D1Factory
directly, because it belongs to a different factory. I have to move bitmap data to CPU memory, then copy it into GPU memory of the mainID2D1Factory
. This operation also introduce significant lags (I believe it to be due to large memory accesses, but I'm not sure).
Is there a way to do this efficiently?
P.S. All the timing here are given for Acer Switch 10 tablet. On regular Core i7 PC both approaches work without any visible lag.
Ok, I've found a solution.
Basically, all I needed is to modify approach 2 to use DXGI resource sharing between two DirectX factory sets. I'll skip all the gory details (they can be found here: http://xboxforums.create.msdn.com/forums/t/66208.aspx), but basic steps are:
ID3D11Device2
from main resource set, create D3D 2D texture byCreateTexture2D
D3D11_BIND_RENDER_TARGET
,D3D11_BIND_SHADER_RESOURCE
,D3D11_RESOURCE_MISC_SHARED_NTHANDLE
andD3D11_RESOURCE_MISC_SHARED_KEYEDMUTEX
flags.IDXGIResource1
and callingCreateSharedHandle
from it withXGI_SHARED_RESOURCE_READ
andDXGI_SHARED_RESOURCE_WRITE
.ID3D11Device2::OpenSharedResource1
.IDXGIKeyedMutex::AcquireSync
), create render target from it (ID2D1Factory2::CreateDxgiSurfaceRenderTarget
), draw on it and release mutex (IDXGIKeyedMutex::ReleaseSync
).Note that mutex locking stuff is necessary. Not doing it results in some cryptic DirectX debug error messages, and erroneous operation or even crashing.
tl;dr: Render to bitmaps on background thread in software mode. Draw from bitmaps to render target on UI thread in hardware mode.
The best approach I've been able to find so far is to use background threads with software rendering (
IWICImagingFactory::CreateBitmap
andID2D1Factory::CreateWicBitmapRenderTarget
) and then copy it to a hardware bitmap back on the thread with the hardware render target viaID2D1RenderTarget::CreateBitmapFromWicBitmap
. And then blit that usingID2D1RenderTarget::DrawBitmap
.This is how paint.net 4.0 does selection rendering. When you're drawing a selection with the lasso tool, it will use a background thread to draw the selection outline asynchronously (the UI thread does not wait for this to complete). You can end up with a very complicated polygon due to the stroke style and animations. I render it 4 times, where each animation frame has a slightly different offset for the dashed stroke style.
Obviously this rendering can take awhile as the polygon becomes more complex (that is, if you keep scribbling for awhile). I have a few other special optimizations for when you use the Move Selection tool which allows you to do transformations (rotate, translate, scale): if the background thread hasn't yet re-rendered the current polygon with the new transform, then I will render the old bitmap (with the current polygon and old transform) with the new transform applied. The selection outline may be distorted (scaling) or clipped (translated outside of viewable area) while the background thread catches up, but it's a small price to pay for 60fps responsiveness. This optimization works very well because you can't be modifying the polygon and transform of a selection at the same time.