How can semaphores implemented on multi-core syste

2019-06-14 20:57发布

I can not understand how Posix allows any thread to unlock (post) on a semaphore. Let's consider following example:

// sem1 and sem2 are a Posix semaphore,
// properly initialized for single process use
// at this point, sem2 is locked, sem1 is unlocked 
// x and y are global (non-atomic, non-volatile) integer variables
// thread 1 - is executing right now

rc = sem_wait(&sem1); // succeeded, semaphore is 0 now
x = 42;
y = 142;
sem_post(&sem2);
while (true);

// thread 2. waits for sem2 to be unlocked by thread1 
sem_wait(&sem2);
sem_post(&sem1);

// thread 3
sem_wait(&sem1); // wakes up when sem1 is unlocked by thread2
#ifdef __cplusplus
std::cout << "x: " << x << ; y: " << y << "\n";
#else
printf("x: %d; y: %d\n", x, y);
#endif

Now, according to everything I've read, this code is 100% kosher for passover. In thread 3, we are guaranteed to see x as 42, y as 142. We are proteced from any race.

But this is what I can't understand. All those threads can potentially be executed on 3 different cores. And if the chip doesn't have internally strong memory ordering (ARM, PowerPC) or writes are not-atomic (x86 for unaligned data) how can thread2 on Core2 possibly request Core1 (busy with thread1) to properly release the data / complete writes / etc? As far as I know, there are no such commands!

What I am missing here?

EDIT. Please note, suggested duplicate doesn't answer my question. It reiterates my statement, but doesn't explain how the effect can possibly be achieved. In particular, it doesn't explain how Core2 can put memory barrier on data inside Core1's cache.

3条回答
Melony?
2楼-- · 2019-06-14 21:20

Looks like it does this by waiting for signals (controlled by the OS) from the POSIX Threads Library for Win32:

       * If the sema is posted between us being cancelled and us locking
       * the sema again above then we need to consume that post but cancel
       * anyway. If we don't get the semaphore we indicate that we're no
       * longer waiting.
       */
      if (*((sem_t *)sem) != NULL && !(WaitForSingleObject(s->sem, 0) == WAIT_OBJECT_0))
    {
      ++s->value;
#if defined(NEED_SEM)
      if (s->value > 0)
        {
          s->leftToUnblock = 0;
        }
#else
      /*
       * Don't release the W32 sema, it doesn't need adjustment
       * because it doesn't record the number of waiters.
       */
#endif /* NEED_SEM */
    }
      (void) pthread_mutex_unlock (&s->lock);
    }
}

Of course we don't have the Windows source code at our disposal. So the trail for the POSIX implementation ends there.

This is not the case for Linux. sem_wait.c makes use of futex_wait(), and it is in the source for this function that a determination is made as to whether or not the CPU can support certain functions. For example; does the CPU have a compare and exchange function? But even there the full capabilities of the architecture isn't fully considered. Functions such as lll_futex_wait() are defined in lowlevellock.h. So for PowerPC we have powerpc/lowlevellock.h, with the following snippet for example:

/* Set *futex to ID if it is 0, atomically.  Returns the old value */
#define __lll_robust_trylock(futex, id) \
  ({ int __val;                                   \
     __asm __volatile ("1:  lwarx   %0,0,%2" MUTEX_HINT_ACQ "\n"          \
               "    cmpwi   0,%0,0\n"                 \
               "    bne 2f\n"                     \
               "    stwcx.  %3,0,%2\n"                \
               "    bne-    1b\n"                     \
               "2:  " __lll_acq_instr                 \
               : "=&r" (__val), "=m" (*futex)                 \
               : "r" (futex), "r" (id), "m" (*futex)              \
               : "cr0", "memory");                    \
     __val;                                   \
  })

So the answer is probably that if it was implemented on Linux then there is probably some implementation or workaround in the architecture specific library that works around 'missing' instructions.

查看更多
倾城 Initia
3楼-- · 2019-06-14 21:30

...because that is how it is defined.

Any thread may post a unit to a semaphore.

A semaphore is not primarily a locking mechanism, so 'unlock' is inappropriate.

Semaphores support post and wait.

In the case of complex architectures, eg. many multi-core processors, implementation of semaphore, and other inter-thread synchro mechanisms may get quite complex. It might, for example, be necessary to send a synchro message via the inter-processor driver, so hardware-interrupting the other cores/s to force them to handle the synchro.

查看更多
唯我独甜
4楼-- · 2019-06-14 21:39

Let if initial value of sem1 = 1 and sem2 = 0 then thread 3 and thread 1 both can take lock. It depends on which thread enters first..Suppose if thread 3 execute first then sem1 will become 0 now and thread 1 can not take the lock and also thread 2 can not take the lock because of its dependency on thread 1. And if initial value of sem1 = 0 then no thread can get lock....I think this will help..

查看更多
登录 后发表回答