You can see an interesting table at this link. http://norvig.com/21-days.html#answers
The table described,
Mutex lock/unlock 25 nanosec
fetch from main memory 100 nanosec
Nanosec?
I surprised because mutex lock
is faster than fetch data from memory
. If so, what mutex lock
exactly do? And what does Mutex lock
mean at the table?
The article you linked does not mentioned the architecture, but judging by mentions of L1 and L2 cache it's Intel. If this is so, then I think that by mutex they meant LOCK instruction. In this respect this post seems relevant: Intel 64 and IA-32 | Atomic operations including acquire / release semantic
Also Intel software developer's manual can help if you know, what you are looking for. I'd read anything relevant I could find about the LOCK instruction.
Let's say that ten people had to share a pen (maybe they work at a really cash-strapped company). Since they have to write long documents with the pen, but most of the work in writing a document is just thinking of what to say, they agree that each person gets to use the pen to write one sentence of the document, and then has to make it available to the rest of the group.
Now we have a problem: what if two people are done thinking about the next sentence, and both want to use the pen at once? We could just say that both people can grab the pen, but this is a fragile old pen, so if two people grab it then it breaks. Instead, we draw a chalk line around the pen. First you put your hand across the chalk line, then you grab the pen. If one person's hand is inside the chalk line, then nobody else is allowed to put their hand inside the chalk line. If two people try to put their hand across the chalk line at the same time, under these rules only one of them will get inside the chalk line first, so the other has to pull back their hand and keep it just outside the chalk line until the pen is available again.
Let's relate this back to mutexes. A mutex is a way to protect a shared resource (the pen) for a short period of time called the critical section (the time to write one sentence of a document). Whenever you want to use the resource, you agree to call mutex_lock
first (put your hand inside the chalk line). Whenever you're done with the resource, you agree to call mutex_unlock
(take your hand out from the chalk line area).
Now to how mutexes are implemented. A mutex is usually implemented with shared memory. There is some shared opaque data object called a mutex, and the mutex_lock
and mutex_unlock
functions both take a pointer to one of these. The mutex_lock
function checks and modifies data inside the mutex using an atomic test-and-set or load-linked/store-conditional instruction sequence (on x86, xhcg
is often used), and either "acquires the mutex" - sets the contents of the mutex object to indicate to other threads that the critical section is locked - or has to wait. Eventually, the thread gets the mutex, does the work inside the critical section, and calls mutex_unlock
. This function sets the data inside the mutex to mark it as available, and possibly wakes up sleeping threads that have been trying to acquire the mutex (this depends on the mutex implementation - some implementations of mutex_lock
just spin in a tight look on xchg
until the mutex is available, so there is no need for mutex_unlock
to notify anybody).
Why would locking a mutex be faster than going out to memory? In short, caching. The CPU has a cache that can be accessed very quickly, so the xchg
operation doesn't need to go all the way out to memory as long as the processor can ensure that there is no other processor accessing that data. But x86 has a notion of "owning" a cache line - if processor 0 owns a cache line, any other processor that wants to use data in that cache line has to go through processor 0. This way, there is no need for the xhcg
operation to look at any data beyond the cache, and cache access tends to be very fast, so acquiring an uncontested mutex is faster than a memory access.
There is one caveat to that last paragraph, though: the speed benefit only holds for an uncontested mutex lock. If two threads try to lock the same mutex at the same time, the processors that are running those threads have to communicate and deal with ownership of the relevant cache line, which greatly slows down the mutex acquisition. Also, one of the two threads will have to wait for the other thread to execute the code in the critical section and then release the mutex, further slowing down the mutex acquisition for one of the threads.