Cost of mutex,critical section etc on Windows

2019-03-16 01:47发布

问题:

I read somewhere that the overhead of a mutex is not that much, because the context switching only happens in case of contention.

Also known Futexes in Linux.

Does the same thing hold good in Windows? Is Critical Section a more apt map to mutexes in Linux.

From what i gathered, Critical Sections provide better optimal performance compared to Mutex, is this true for every case?

Is there a corner case where mutexes are faster than critical section in Windows.

Assume only a single process-threads are accessing the mutexes(Just to eliminate the other benefit of Critical Sections)

Added Info: OS windows Server,
Language C++

回答1:

Considering the specific purpose of Critical Sections and Mutexes I don't think you can ask a question regarding the cost as you don't have much alternative when you need multiple threads touching the same data. Obviously, if you just need to increment/decrement a number, you can use the Interlocked*() functions on a volatile number and you're good to go. But for anything more complex, you need to use a synchronization object.

Start your reading here on the Synchronization Objects available on Windows^. All functions are listed there, nicely grouped and properly explained. Some are Windows 8 only.

As regarding your question, Critical Sections are less expensive than Mutexes as they are designed to operate in the same process. Read this^ and this^ or just the following quote.

A critical section object provides synchronization similar to that provided by a mutex object, except that a critical section can be used only by the threads of a single process. Event, mutex, and semaphore objects can also be used in a single-process application, but critical section objects provide a slightly faster, more efficient mechanism for mutual-exclusion synchronization (a processor-specific test and set instruction). Like a mutex object, a critical section object can be owned by only one thread at a time, which makes it useful for protecting a shared resource from simultaneous access. Unlike a mutex object, there is no way to tell whether a critical section has been abandoned.

I use Critical Sections for same process synchronization and Mutexes for cross-process synchronization. Only when I REALLY need to know if a synchronization object was abandoned, I use Mutexes in the same process.

So, if you need a sync object, the question is not what are the costs but which is cheaper :) There's really no alternative but memory corruption.

PS: There might be alternatives like the one mentioned in the selected answer here^ but I always go for core platform-specific functionality vs. cross-platformness. It's always faster! So if you use Windows, use the tools of Windows :)

UPDATE

Based on your needs, you might be able to reduce the need of sync objects by trying to do as much self-contained work in a thread as possible and only combine the data at the end or every now and then.

Stupid Example: Take a list of URLs. You need to scrape them and analyze them.

  1. Throw in a bunch of threads and start picking URLs, one by one, from the input list. For each one your process you centralize the results as you do it. It's real time and cool
  2. Or you can throw in the threads each of them having a slice of the input URLs. This removes the need to sync the selection process. You store the analysis result in the thread and at the end, you combine the result just once. Or just once every 10 URLs let's say. Not for each of them. This will reduce the sync operations dramatically.

So costs can be lowered by choosing the right tool and thinking how to lower the lock and unlocks. But costs cannot be removed :)

PS: I only think in URLs :)

UPDATE 2:

Had the need in a project to do some measuring. And the results were quite surprising:

  • A std::mutex is most expensive. (price of cross-platformness)
  • A Windows native Mutex is 2x faster than std.
  • A Critical Section is 2x faster than the native Mutex.
  • A SlimReadWriteLock is +-10% of the Critical Section.
  • My homemade InterlockedMutex (spinlock) is 1.25x - 1.75x faster than the Critical Section.


回答2:

Using std::mutex on windows 8 I usually get 3-4x improvement (on non contending case) speedup by using my own custom made spin lock:

mutex based

auto time = TimeIt([&]() {
for (int i = 0; i < tries; i++) {
    bool val = mutex.try_lock();
    if (val) {
        data.value = 1;
    }
}

});

home made lock free

time = TimeIt([&]() {
    for (int i = 0; i < tries; i++) {
        if (!guard.exchange(true)) {
            // I own you
            data.value = 1;
            guard.store(true);
        }
    }
});

Tests are made on x86.

I haven't figured out what std::mutex uses underline on windows because it generates a lot of code.