I know that hundreds of variations of these considerations have been posted all over the net. However, I haven't found anything that adresses my exact problem, so I hope you can help me see the light.
I'm currently toying with 2D game development in Java using OpenGL. The language and graphics library used is not that relevant to my question though as it has a more general character.
I'm attempting to design a general purpose game loop which can be used more or less as is for any game that has moderately heavy graphics (mostly bitmap textures) and possibly even heavier game logic (AI, collision detection, etc).
Basically I'm expecting to maintain a list/array/buffer of objects that can be updated (positions, velocities, and other game related updates) and rendered (textures/frames at updated positions) non stop as smooth and efficiently as possible.
1) One thread for update + rendering
I have tried and discarded a sequential solution using only one thread (well, two when counting user input).
- Calculate changes and update buffer objects
- Render textures at updated positions to backbuffer
- Swap backbuffer to front
Apparently a lot of good computing time is wasted while swap-buffers is blocking on the hardware, which calls for a more efficient solution
2) One thread for update, one for rendering
By splitting the program into an update thread and a rendering thread and synchronizing access to the shared buffer I should be able to ensure a pretty steady framerate. Synchronizing access to the shared buffer can be accomplished in many ways, but they all have one thing in common. They all prohibit thread concurrency. While this may be a fair tradeoff, I'm wondering what makes synchronization necessary.
3) Same as 2, but without synchronization
I do understand many of the problems that can be caused by careless implementation of concurrent threads. The producer/consumer, the readers/writers and similar situations causing potential deadlocks. However, I do not understand why I need to ensure synchronization of the shared data if the following criteria are met (and they should be):
- The render thread can only read from the shared buffer
- The update thread can both read from, and write to the shared buffer (so it is the only 'writer')
- The shared buffer will never be empty nor full while the game is running
- The threads will never sleep
- The rendering does not have to be 100% accurate. If some shared objects have not yet been updated causing them to be one update-step (ie. some 10-20 millisecs) behind other objects no one should ever notice.
-
So... what obvious thing am I missing here? Why do I need the synchronization for this setup?
- Can threads cache data causing problems when not properly synchronized?
- Or can data somehow be garbled if the writing thread is interrupted at an unfortunate time?
- Or are there any general problems with my proposed stategy making it useless?
Any thoughts, comments, or suggestions are most welcome. Or if this particular question has already been addressed elsewhere I would appreciate a reference.
The unsynchronized approach will work just fine for you; doubly so if this is stock Intel hardware1. I would still not use it.
The reason why unsynchronized concurrency almost never works reliably is that processors have free hand in when to do stores and loads between main RAM and cache. This can subvert almost any unsynchronized protocol. However, as you say, no one is likely to notice if the scenes never abruptly change in your application; all the data will go to the RAM and become visible to the other thread sooner or later.
You however have no guarantee when that will be and in which sequence, which leaves you with a theoretical possibility of mixing two subsequent frames (before and after an abrupt change of the scene or its lighting) in odd ways.
Depending on your programming language and its memory model (C++ older than C++11 I suppose?) you are likely to find lightweight synchronization primitives whose guaranteed side effect are appropriate memory barriers, whose impact on performance will be negligible. This is what I would recommend as a starting point. Extreme performance optimizations (beyond what can be proven safe) should be the last stage of optimizing your engine.
1) i86 never reorders stores. I don't think this is documented anywhere and I would not like relying on it. You can still have reordered reads, so it is no help in your scenario anyway.
I decided to spend a little time testing this, and since I've had so many good answers from this site, I thought I'd post this to complete this question. Maybe someone else will find the information useful.
I made 3 different implementations of a simple sprite rendering application where the Updating and the Rendering runs in separate threads.
1) No Synchronization
The Renderer runs at max 60 FPS.
The Updater runs as fast as possible
The sprites to update and render exist in a list shared by both threads. No synchronization exist so the threads just read and write the data at will.
2) Synchronization of shared data
The Renderer runs at max 60 FPS.
The Updater runs at the same pace as the Renderer
The data to update and render exist in a list shared by both threads. The list is fully synchronized. The Updater updates all sprites in the list. Then the Renderer gains access to the list, and renders all sprites to the screen.
3) Synchronization used double rendering queues
The Renderer runs at max 60 FPS.
The Updater runs at the same pace as the Renderer
The Updater updates the list and sends the sprites to the passive queue of the 2 rendering queue. Meanwhile the Renderer renders the sprites in the active rendering queue. When the Updater has copied the last object to the passive render queue it attempts to swap the active and passive queue. If the renderer is not finished rendering the previous queue, the swap will block. This is the only blocking synchronization. As soon as the Renderer finishes the current frame, the swap is made, the Renderer can start rendering the new queue, and the Updater can start updating and sending to the other (now passive) queue.
I ran 3 tests on each method where I timed the number of times updating and rendering was performed per second.
Test 1:
The number of sprites is sufficiently low so the Renderer can run at full speed (60 FPS)
The update logic of each sprite is too heavy to allow the Updater to keep pace.
Test 2:
The number of sprites is too high for the Renderer to be running at full speed.
The update logic of each sprite is extremely simple, so they can more than keep up.
Test 3:
The number of sprites is exactly high enough to keep the Renderer running a little below max speed.
The update logic of each sprite is exactly heavy enough to keep the Updater running a little below the max speed of the Renderer.
The results
No sync - Test 1:
Renderer runs 60 times per second (max speed).
Updater runs 45 times per second.
No sync - Test 2:
Renderer runs 24 times per second.
Updater runs 1150 times per second.
No sync - Test 3:
Renderer runs 58 times per second.
Updater runs 51 times per second.
Sync shared data - Test 1:
Renderer runs 23 times per second (max speed).
Updater runs 24 times per second.
Sync shared data - Test 2:
Renderer runs 23 times per second.
Updater runs 23 times per second.
Sync shared data - Test 3:
Renderer runs 17 times per second.
Updater runs 17 times per second.
Sync double queue - Test 1:
Renderer runs 43 times per second (max speed).
Updater runs 43 times per second.
Sync double queue - Test 2:
Renderer runs 24 times per second.
Updater runs 24 times per second.
Sync double queue - Test 3:
Renderer runs 54 times per second.
Updater runs 54 times per second.
Conclusion
As you pointed out, Jirka, even if the method of no synchronization seems harmless when there is only one writer it can have unwanted side effects, and it certainly doesn't keep the rendered frame consistent.
It is no great surprise that rendering with dual queues is faster than rendering with one large shared sprite list. What was surprising, however, is that if you consider the fact that there is nothing gained from rendering multiple frames without updating, nor updating multiple times without rendering, then the end result of the dual queue method is actually as fast as the unsynchronized method.
There are probably other things that could be said or tried, but I saw enough already. I will never consider using unsynchronized access for an Update/Render system again..
It is possible to have a separated render and update thread without (much) synchronization. Check out
http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part1/
and
http://blog.slapware.eu/game-engine/programming/multithreaded-renderloop-part2/
for an explanation and implementation (source + binaries). It isn't easy, but it's exactly what you want.