Does mutex_unlock function as a memory fence?

2019-03-16 04:36发布

问题:

The situation I'll describe is occurring on an iPad 4 (ARMv7s), using posix libs to mutex lock/unlock. I've seen similar things on other ARMv7 devices, though (see below), so I suppose any solution will require a more general look at the behaviour of mutexes and memory fences for ARMv7.

Pseudo code for the scenario:

Thread 1 – Producing Data:

void ProduceFunction() {
  MutexLock();
  int TempProducerIndex = mSharedProducerIndex; // Take a copy of the int member variable for Producers Index
  mSharedArray[TempProducerIndex++] = NewData; // Copy new Data into array at Temp Index 
  mSharedProducerIndex = TempProducerIndex; // Signal consumer data is ready by assigning new Producer Index to shared variable
  MutexUnlock();
}

Thread 2 – Consuming Data:

void ConsumingFunction () {
  while (mConsumerIndex != mSharedProducerIndex) {
    doWorkOnData (mSharedArray[mConsumerIndex++]);
  }
}

Previously (when the problem cropped up on iPad 2), I believed that mSharedProducerIndex = TempProducerIndex was not being performed atomically, and hence changed to use an AtomicCompareAndSwap to assign mSharedProducerIndex. This has worked up until this point, but it turns out I was wrong and the bug has come back. I guess the 'fix' just changed some timing.

I have now come to the conclusion that the actual problem is an out of order execution of the writes within the mutex lock, i.e. if either the compiler or the hardware decided to reorder:

mSharedArray[TempProducerIndex++] = NewData; // Copy new Data into array at Temp Index 
mSharedProducerIndex = TempProducerIndex;  // Signal consumer data is ready by assigning new Producer Index to shared variable

... to:

mSharedProducerIndex = TempProducerIndex; // Signal consumer data is ready by assigning new Producer Index to shared variable
mSharedArray[TempProducerIndex++] = NewData; // Copy new Data into array at Temp Index 

... and then the consumer interleaved the producer, the data would not have yet been written when the consumer tried to read it.

After some reading on memory barriers, I therefore thought I’d try moving the signal to the consumer outside the mutex_unlock, believing that the unlock would produce a memory barrier/fence which would ensure mSharedArray had been written to:

mSharedArray[TempProducerIndex++] = NewData;  // Copy new Data into array at Temp Index 
MutexUnlock();
mSharedProducerIndex = TempProducerIndex; // Signal consumer data is ready by assigning new Producer Index to shared variable

This, however, still fails, and leads me to question if a mutex_unlock will definitely act as a write fence or not?

I've also read an article from HP which suggested that compilers could move code into (but not out of) crit_secs. So even after the above change, the write of mSharedProducerIndex could be before the barrier. Is there any mileage to this theory?

By adding an explicit fence the problem goes away:

mSharedArray[TempProducerIndex++] = NewData; // Copy new Data into array at Temp Index 
OSMemoryBarrier();
mSharedProducerIndex = TempProducerIndex; // Signal consumer data is ready by assigning new Producer Index to shared variable

I therefore think I understand the problem, and that a fence is required, but any insight into the behaviour of the unlock and why it doesn’t appear to be performing a barrier would be really useful.

EDIT:

Regarding the lack of a mutex in the consumer thread: I'm relying on the write of the int mSharedProducerIndex being a single instruction and therefore hoping the consumer would read either the new or old value. Either are valid states, and providing that mSharedArray is written in sequence (i.e. prior to writing mSharedProducerIndex) this would be OK, but from what has been said so far, I can’t reply on this.

By the same logic it appears that the current barrier solution is also flawed, as the mSharedProducerIndex write could be moved inside the barrier and could therefore potentially be incorrectly re-ordered.

Is it recommended to add a mutex to the consumer, just to act as a read barrier, or is there a pragma or instruction for disabling out-of-order execution on the producer, like EIEIO on PPC?

回答1:

Your produces are sync'ed but you don't do any synchronization (you need to synchronize memory with barriers as well) on consuming. So even if you have perfect memory barriers for producers that memory barriers won't help consumers.

In your code, you can be hit by compiler's ordering, hardware ordering even by a stale value of mSharedProducerIndex on other core running Thread #2.

You should read Chapter 11: Memory Ordering from Cortex™-A Series Programmer’s Guide, especially 11.2.1 Memory barrier use example.

I think your problem is you are getting partial updates in consumer thread. Problem is what is inside critical section in producer is not atomic and it can be reordered.

By not atomic I mean if your mSharedArray[TempProducerIndex++] = NewData; is not a word store (NewData has type of int) it might be done in several steps which can be seen by other core as partial updates.

By reordering I mean mutex provides barriers in and out but not impose any ordering during critical section. Since you don't have any special construct in consumer side you can see mSharedProducerIndex is updated but still see partial updates to mSharedArray[mConsumerIndex]. Mutex only guarantee memory visibility after execution leaves critical section.

I believe this also explains why it works when you add OSMemoryBarrier(); inside critical section, because this way cpu is forced to write data into mSharedArray then update mConsumerIndex and when other core/thread sees mConsumerIndex we know that mSharedArray is copied fully because of the barrier.

I think your implementation with OSMemoryBarrier(); is correct assuming you have many-producers and one-consumer. I disagree with any comments suggesting putting a memory barrier in consumer, since I believe that won't fix partial updates or reordering happening in critical section inside producer.

As an answer to your question in title, in general afaik mutexes have read barrier before they enter and write barrier after they leave.



回答2:

The "theory" is correct, writes can be moved from after a write fence to before it.

The fundamental problem with your code is that there is no synchronization at all in thread 2. You read mSharedProducerIndex without a read barrier, so who knows what value you'll get. Nothing that you do in thread 1 will solve that.