Note that I can conduct the research inside the boost source code, and may do this to answer my own curiosity if there isn't anyone out there with an answer.
I do ask however because maybe someone has already done this comparison and can answer authoritatively?
It would seem that creating a shared memory mapped file between processes, and through construction with InterlockedIncrement()
one could create a largely usermode mutex akin to a CRITICAL_SECTION
, which would be considerably more performant than the Win32 Mutex for interprocess synchronisation.
So my expectation is that it may be probably for the implementation on Win32 of boost::interprocess_mutex
to have been implemented in this manner, and for it to be substantially quicker than the native API offering.
I only however have a supposition, I don't know through field testing what the performance of the boost::interprocess_mutex
is for interprocess synchronisation, or deeply investigated its implementation.
Does anyone have experience in using it or profiling its relative performance, or can they comment on using the safety of using InterlockedIncrement() across processes using shared memory?
In boost 1.39.0, there is only specific support for pthreads. On all other platforms, it becomes a busy-loop with a yield call in the middle (essentially the same system that you describe). See boost/interprocess/sync/emulation/interprocess_mutex.hpp. For example, here's the implementation of lock():
inline void interprocess_mutex::lock(void)
{
do{
boost::uint32_t prev_s = detail::atomic_cas32(const_cast<boost::uint32_t*>(&m_s), 1, 0);
if (m_s == 1 && prev_s == 0){
break;
}
// relinquish current timeslice
detail::thread_yield();
}while (true);
}
What this means is that a contended boost::interprocess::mutex on windows is VERY expensive - although the uncontended case is almost free. This could potentially be improved by adding an event object or similar to sleep on, but this would not fit well with boost::interprocess's API, as there would be nowhere to put the per-process HANDLE needed to access the mutex.
It would seem that creating a shared memory mapped file between processes, and through construction with InterlockedIncrement() one could create a largely usermode mutex akin to a CRITICAL_SECTION, which would be considerably more performant than the Win32 Mutex for interprocess synchronisation.
CRITICAL_SECTION
internally can use a synchronization primitive when there's contention. I forget if it's an event, semaphore, or mutex.
You can "safely" use Interlocked
functions on memory, so there's no reason why you couldn't use it for cross-process synchronization, other than that would be really crazy and you should probably either use threads or a real synchronization primitive.
But officially, you can.