可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I have read so many times, here and everywhere on the net, that mutexes are slower than critical section/semaphores/insert-your-preferred-synchronisation-method-here. but i have never seen any paper or study or whatever to back up this claim.

so, where does this idea come from ? is it a myth or a reality ? are mutexes really slower ?

回答1:

In the book "Multithreading application in win32" by Jim Beveridge and Robert Wiener it says: "It takes almost 100 times longer to lock an unowned mutex than it does to lock an unowned critical section because the critical section can be done in user mode without involving the kernel"

And on msdn here it says "critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization"

回答2:

I don't believe that any of the answers hit on the key point of why they are different.

Mutexes are at operating system level. A named mutex exists and is accessible from ANY process in the operating system (provided its ACL allows access from all).

Critical sections are faster as they don't require the system call into kernel mode, however they will only work WITHIN a process, you cannot lock more than one process using a critical section. So depending on what you are trying to achieve and what your software design looks like, you should choose the most appropriate tool for the job.

I'll additionally point out to you that Semaphores are separate to mutex/critical sections, because of their count. Semaphores can be used to control multiple concurrent access to a resource, where as a mutex/critical section is either being accessed or not being accessed.

回答3:

A CRITICAL_SECTION is implemented as a spinlock with a capped spin count. See MSDN InitializeCriticalSectionAndSpinCount for the indication of this.

When the spin count 'elapsed', the critical section locks a semaphore (or whatever kernel-lock it is implemented with).

So in code it works like this (not really working, should just be an example) :

CRITICAL_SECTION s;

void EnterCriticalSection( CRITICAL_SECTION* s )
{
    int spin_count = s.max_count;
    while( --spin_count >= 0 )
    {
        if( InterlockedExchange( &s->Locked, 1 ) == 1 )
        {
           // we own the lock now
           s->OwningThread = GetCurrentThread();
           return;
        }
    }
    // lock the mutex and wait for an unlock
    WaitForSingleObject( &s->KernelLock, INFINITE );
}

So if your critical section is only held a very short time, and the entering thread does only wait very few 'spins' (cycles) the critical section can be very efficient. But if this is not the case, the critical section wastes many cycles doing nothing, and then falls back to a kernel synchronization object.

So the tradeoff is :

Mutex : Slow acquire/release, but no wasted cycles for long 'locked regions'

CRITICAL_SECTION : Fast acquire/release for unowned 'regions', but wasted cycles for owned sections.

回答4:

Yes, critical sections are more efficient. For a very good explanation, get "Concurrent Programming on Windows".

In a nutshell: a mutex is a kernel object, so there is always a context switch when you acquire one, even if "free". A critical section can be acquired without a context switch in that case, and (on an multicore/processor machine) it will even spin a few cycles if it's blocked to prevent the expensive context switch.

回答5:

A mutex (at least in windows) allows for synchronizations between different processes in addition to threads. This means extra work must be done to ensure this. Also, as Brian pointed out, using a mutex also requires a switch to "kernel" mode, which causes another speed hit (I believe, i.e. infer, that the kernel is required for this interprocess synchronization, but I've got nothing to back me up on that).

Edit: You can find explicit reference to interprocess synchronization here and for more info on this topic, have a look at Interprocess Synchronization