I can not understand how Posix allows any thread to unlock (post) on a semaphore. Let's consider following example:
// sem1 and sem2 are a Posix semaphore,
// properly initialized for single process use
// at this point, sem2 is locked, sem1 is unlocked
// x and y are global (non-atomic, non-volatile) integer variables
// thread 1 - is executing right now
rc = sem_wait(&sem1); // succeeded, semaphore is 0 now
x = 42;
y = 142;
sem_post(&sem2);
while (true);
// thread 2. waits for sem2 to be unlocked by thread1
sem_wait(&sem2);
sem_post(&sem1);
// thread 3
sem_wait(&sem1); // wakes up when sem1 is unlocked by thread2
#ifdef __cplusplus
std::cout << "x: " << x << ; y: " << y << "\n";
#else
printf("x: %d; y: %d\n", x, y);
#endif
Now, according to everything I've read, this code is 100% kosher for passover. In thread 3, we are guaranteed to see x
as 42, y
as 142. We are proteced from any race.
But this is what I can't understand. All those threads can potentially be executed on 3 different cores. And if the chip doesn't have internally strong memory ordering (ARM, PowerPC) or writes are not-atomic (x86 for unaligned data) how can thread2 on Core2 possibly request Core1 (busy with thread1) to properly release the data / complete writes / etc? As far as I know, there are no such commands!
What I am missing here?
EDIT. Please note, suggested duplicate doesn't answer my question. It reiterates my statement, but doesn't explain how the effect can possibly be achieved. In particular, it doesn't explain how Core2 can put memory barrier on data inside Core1's cache.
Looks like it does this by waiting for signals (controlled by the OS) from the POSIX Threads Library for Win32:
Of course we don't have the Windows source code at our disposal. So the trail for the POSIX implementation ends there.
This is not the case for Linux.
sem_wait.c
makes use offutex_wait()
, and it is in the source for this function that a determination is made as to whether or not the CPU can support certain functions. For example; does the CPU have a compare and exchange function? But even there the full capabilities of the architecture isn't fully considered. Functions such aslll_futex_wait()
are defined inlowlevellock.h
. So for PowerPC we havepowerpc/lowlevellock.h
, with the following snippet for example:So the answer is probably that if it was implemented on Linux then there is probably some implementation or workaround in the architecture specific library that works around 'missing' instructions.
...because that is how it is defined.
Any thread may post a unit to a semaphore.
A semaphore is not primarily a locking mechanism, so 'unlock' is inappropriate.
Semaphores support post and wait.
In the case of complex architectures, eg. many multi-core processors, implementation of semaphore, and other inter-thread synchro mechanisms may get quite complex. It might, for example, be necessary to send a synchro message via the inter-processor driver, so hardware-interrupting the other cores/s to force them to handle the synchro.
Let if initial value of sem1 = 1 and sem2 = 0 then thread 3 and thread 1 both can take lock. It depends on which thread enters first..Suppose if thread 3 execute first then sem1 will become 0 now and thread 1 can not take the lock and also thread 2 can not take the lock because of its dependency on thread 1. And if initial value of sem1 = 0 then no thread can get lock....I think this will help..