I don't quite understand the documentation for InitializeCriticalSectionAndSpinCount: http://msdn.microsoft.com/en-us/library/windows/desktop/ms683476(v=vs.85).aspx
It says "You can improve performance significantly by choosing a small spin count ..."
However, since waiting on a spinner is faster than waiting for an object, doesn't it make sense to have the SpinCount as high as possible? What am I missing? Thanks.
(I am using it inside a C DLL used by a multi-threaded application)
Here is the code for the critical section, called constantly by a large number of threads:
int g_slots[256] = {0};
...
slot = 256;
EnterCriticalSection(&g_LockHandle);
while (slot-- > 0)
{
if (g_slots[slot] == 0)
{
g_slots[slot] = spid;
break;
}
}
LeaveCriticalSection(&g_LockHandle);
Followup comments:
For anyone that is interested, here are my informal results when testing on a 4 core server running Windows 2008 R2: if doing an ultra-fast operation such as test and increment a single variable, Interlocked wins hands down. A distant second is CriticalSection+SpinCount with a low spin count (e.g., 16), followed by plain old CriticalSection. However, if scanning an array (e.g., of integers), Interlocked comes in third, after CriticalSection (with or without SpinCount). CriticalSection+high SpinCount was the slowest in all cases.
Neil Weicher www.netlib.com
What the documentation actually says, with my emphasis on the text that you removed, is:
So, the choice of spin count depends very critically on the duration of the critical section.
You ask:
It is simply not true that spinning is faster than blocking. For a long duration critical section, it is best to avoid spinning altogether. If it is likely that the lock won't be released for a significant amount of time, then the best policy is to block immediately and wait until you can acquire the lock. Even for a short duration section, it is possible that the thread that holds the lock is not scheduled to run, in which case spinning is clearly wasteful of CPU resource.
Spinning is only beneficial if there is a good probability that the lock can be acquired whilst spinning. And even then only if the time spent spinning is less than the time spent yielding, the context switch cost.
I agree to the statement "You can improve performance significantly by choosing a small spin count" itself.
When I tested my object pool class which uses InitializeCriticalSectionAndSpinCount on a 8-core PC, the best value was lesser than 20. The larger spin count is, the slower it works.
These are my deduction by this test result:
I don't think spin count should be larger than thousands. Spin count is a busy-wait. It not only consumes CPU power, but also consumes much bandwidth between CPU and RAM, thus it may cause starvation of traffic between other CPUs and RAM.