Fast multi-window rendering

2020-05-19 02:10发布

问题:

I've been searching and testing different kinds of rendering libraries for C# days for many weeks now. So far I haven't found a single library that works well on multi-windowed rendering setups. The requirement is to be able to run the program on 12+ monitor setups (financial charting) without latencies on a fast computer. Each window needs to update multiple times every second. While doing this CPU needs to do lots of intensive and time critical tasks so some of the burden has to be shifted to GPUs. That's where hardware rendering steps in, in another words DirectX or OpenGL.

I have tried GDI+ with windows forms and figured it's way too slow for my needs. I have tried OpenGL via OpenTK (on windows forms control) which seemed decently quick (I still have some tests to run on it) but painfully difficult to get working properly (hard to find/program good text rendering libraries). Recently I tried DirectX9, DirectX10 and Direct2D with Windows forms via SharpDX. I tried a separate device for each window and a single device/multiple swap chains approaches. All of these resulted in very poor performance on multiple windows. For example if I set target FPS to 20 and open 4 full screen windows on different monitors the whole operating system starts lagging very badly. Rendering is simply clearing the screen to black, no primitives rendered. CPU usage on this test was about 0% and GPU usage about 10%, I don't understand what is the bottleneck here? My development computer is very fast, i7 2700k, AMD HD7900, 16GB ram so the tests should definitely run on this one.

In comparison I did some DirectX9 tests on C++/Win32 API one device/multiple swap chains and I could open 100 windows spread all over the 4-monitor workspace (with 3D teapot rotating on them) and still had perfectly responsible operating system (fps was dropping of course on the rendering windows quite badly to around 5 which is what I would expect running 100 simultaneous renderings).

Does anyone know any good ways to do multi-windowed rendering on C# or am I forced to re-write my program in C++ to get that performance (major pain)? I guess I'm giving OpenGL another shot before I go the C++ route. I'll report any findings here.

Test methods for reference:

For C# DirectX one-device multiple swapchain test I used the method from this excellent answer: Display Different images per monitor directX 10

Direct3D10 version:

I created the d3d10device and DXGIFactory like this:

D3DDev = new SharpDX.Direct3D10.Device(SharpDX.Direct3D10.DriverType.Hardware,
            SharpDX.Direct3D10.DeviceCreationFlags.None);
DXGIFac = new SharpDX.DXGI.Factory();

Then initialized the rendering windows like this:

var scd = new SwapChainDescription();
scd.BufferCount = 1;
scd.ModeDescription = new ModeDescription(control.Width, control.Height,
      new Rational(60, 1), Format.R8G8B8A8_UNorm);
scd.IsWindowed = true;
scd.OutputHandle = control.Handle;
scd.SampleDescription = new SampleDescription(1, 0);
scd.SwapEffect = SwapEffect.Discard;
scd.Usage = Usage.RenderTargetOutput;

SC = new SwapChain(Parent.DXGIFac, Parent.D3DDev, scd);

var backBuffer = Texture2D.FromSwapChain<Texture2D>(SC, 0);
_rt = new RenderTargetView(Parent.D3DDev, backBuffer);

Drawing command executed on each rendering iteration is simply:

Parent.D3DDev.ClearRenderTargetView(_rt, new Color4(0, 0, 0, 0));
SC.Present(0, SharpDX.DXGI.PresentFlags.None);

DirectX9 version is very similar:

Device initialization:

PresentParameters par = new PresentParameters();
par.PresentationInterval = PresentInterval.Immediate;
par.Windowed = true;
par.SwapEffect = SharpDX.Direct3D9.SwapEffect.Discard;
par.PresentationInterval = PresentInterval.Immediate;
par.AutoDepthStencilFormat = SharpDX.Direct3D9.Format.D16;
par.EnableAutoDepthStencil = true;
par.BackBufferFormat = SharpDX.Direct3D9.Format.X8R8G8B8;

// firsthandle is the handle of first rendering window
D3DDev = new SharpDX.Direct3D9.Device(new Direct3D(), 0, DeviceType.Hardware, firsthandle,
    CreateFlags.SoftwareVertexProcessing, par);

Rendering window initialization:

if (parent.D3DDev.SwapChainCount == 0)
{
    SC = parent.D3DDev.GetSwapChain(0);
}
else
{
    PresentParameters pp = new PresentParameters();
    pp.Windowed = true;
    pp.SwapEffect = SharpDX.Direct3D9.SwapEffect.Discard;
    pp.BackBufferFormat = SharpDX.Direct3D9.Format.X8R8G8B8;
    pp.EnableAutoDepthStencil = true;
    pp.AutoDepthStencilFormat = SharpDX.Direct3D9.Format.D16;
    pp.PresentationInterval = PresentInterval.Immediate;

    SC = new SharpDX.Direct3D9.SwapChain(parent.D3DDev, pp);
}

Code for drawing loop:

SharpDX.Direct3D9.Surface bb = SC.GetBackBuffer(0);
Parent.D3DDev.SetRenderTarget(0, bb);

Parent.D3DDev.Clear(ClearFlags.Target, Color.Black, 1f, 0);
SC.Present(Present.None, new SharpDX.Rectangle(), new SharpDX.Rectangle(), HWND);
bb.Dispose();

C++ DirectX9/Win32 API test with multiple swapchains and one device code is here:

[C++] DirectX9 Multi-window test - Pastebin.com

It's a modified version from Kevin Harris's nice example code.

Edit:

Just to make it clear, my main problem is not low fps here when doing multi-window rendering, it's the general latency caused to all operating system functions (window animations, dragging&dropping scrolling etc).

回答1:

Speaking of DirectX only here, but I remember we had the same kind of issue once (5 graphics card and 9 screens for a single PC).

Lot of times full screen switch seems to want to enable vertical sync on monitors, and since Present can't be threaded, the more screens with vertical sync the higher drop you will have (since you will wait between 0 and 16 milliseconds) for each present call.

Solution we had in our case was to create window as maximised and remove borders, it's not ideal but turned from 10 fps drawing a rectangle back to standard speed (60).

If you want code sample let me know I'll prepare one.

Also just for testing I had a go creating 30 windows on my engine using c#/slimdx/dx11, rendering a sphere with basic shading, still well over 40 fps.



回答2:

We have a similar problem (need to render 3D views on 9+ monitors using 3+ graphics cards). We opted to use raw DirectX11 after finding that 3rd party rendering libraries are all very poor at multiple windows across multiple monitors, let alone with multiple adapters too. (It seems most engines are designed for a fullscreen game, and tend to suck at windowed views). Rather than using a 3rd party layer like SlimDX or SharpDX, we decided in the end to write the core renderer directly in C++ and just expose the simple API that our application needs via C++/CLI - this should maximise performance and minimise maintainability issues (relying on 3rd party vendor for bug fixes etc).

However, just like you, we found in testing that if we rendered 9 views from a single process (each rendered on its own thread), we got terrible performance (very low frame rates). However, if we ran 9 separate processes (one per view/monitor), the performance was as expected (excellent).

So having spent days trawling the net fruitlessly for a better solution, we opted for simply running our renderers in separate processes. Not entirely a bad solution for us as our renderers need to support distribution over multiple PCs anyway, so it just means we'll use this facility permanently instead of only when required.

(I don't know if this is helpful to you as an answer, but we'd also be very keen to know if there are any other solutions out there that work across multiple graphics cards, in case we're missing a better trick)



回答3:

Never had the opportunity to run this kind of scenarios, but the only thing I'm pretty sure is that there is absolutely no concern using a managed wrapper, you would have exactly the same problem with C++ code.

Also, in your description, It is pretty unclear how many graphics card do you have installed on your system. Also you should follow more closely the "DirectX Graphics Infrastructure (DXGI): Best Practices" as they are describing lots of problem that you could have. Running with different graphics card in fullscreen with correctly swapchain setup for fullscreen should be ok (using "flip" instead of "blit", see msdn doc about this ), but if you are running your app in maximized window, I don't think that performance will be good, as the blit will interfere and produce some lags.

You can perfectly have a single multithreaded application using multiple device, one device per thread and they should be able to schedule things correctly... but again, as I have no experience in this kind of scenarios, there could be some kind of GPU scheduling problem in this specific case.

If the problem persist even after following carefully DXGI setup, I would suggest you to debug the whole thing with GPUView in order to check more carefully these problems. It is intended exactly for this kind of scenarios, but you will have to take some time to understand how to make a diagnostic with this kind of tool. There was also one talk about GPUView at last GDC 2012: Using GPUView to Understand your DirectX 11 Game (Jon Story) that is probably worth reading.



回答4:

Make sure you've disabled security checks for calls to native code (via SuppressUnmanagedCodeSecurityAttribute).

The associated stack walking is a performance killer.