Monitor.Pulse and PulseAll requires that the lock it operates on is locked at the time of call. This requirement seems unnecessary and detrimental for performance. My first idea was that this results in 2 wasted context switches, but this was corrected by nobugz below (thanks). I am still unsure whether it involves a potential for wasted context switches, as the other thread(s) which were waiting on the monitor are already available for the sheduler, but if they are scheduled, they will only be able to run a few instructions before hitting the mutex, and having to context-switch again. This would look much simpler and faster if the lock was unlocked before invoking the Monitor.Pulse.
Pthread condition variables implement the same concept, but it does not have the above described limitation: you can call pthread_cond_broadcast even if you do not own the mutex. I see this as a proof that the requirement is not justified.
Edit: I realize that a lock is required to protect the shared resource that is usually changed before the Monitor.Pulse. I was trying to say that that lock could have been unlocked after access to the resource but before the Pulse, given that Monitor would support this. This would help in limiting the lock to the shortest time during which the shared resource is accessed. As such:
void f(Item i)
{
lock(somequeue) {
somequeue.add(i);
}
Monitor.Pulse(somequeue); // error
}
Wait is designed to be used with a conditional check. If the conditional check were not done within a lock, it would be possible for the following sequence of events to occur:
Once that sequence of events occurs, it's entirely possible that nothing will ever Pulse the lock again (unless a situation arises where a wait would again be necessary, and again no longer be necessary). Thread #1 could thus wait forever for an event that never arrives.
Putting the conditional check and Wait within a lock avoids this danger, since there will be no way for another thread to change the condition between the time the condition is checked and the time the Wait begins. Consequently another thread that changes the Condition and does a Pulse can be assured that the first thread either checked the condition after it was changed (and thus avoided the wait) or else was executing the Wait when the Pulse executed (and was thus able to resume).
I have found the answer in this paper:
http://research.microsoft.com/pubs/64242/implementingcvs.pdf
it states:
The paper in its entirety is a bit vague and does not mention many implementation details, it is rather at pseudo/academical level. But apparently the guys who wrote it had responsibility in the actual .net implementation.
But roughly put: the signal is just a logical/user level operation and does not fire a primitive like condition variable signaling right away. It only does so on the lock scope exit. So there are no performance issues. It is truely disturbing when one is used to manipulate condition variables directly indeed.
The reason has to do with memory barriers and guaranteeing thread safety.
Shared variables (conditionals) that are used to determine whether a Pulse() is needed will be checked by all threads involved. Without a memory barrier, the changes might be kept in a register and be invisible from one thread to another. Reads and writes can also be re-ordered when viewed across threads.
However, variables that are accessed from within a lock use a memory barrier, so they are accessible to all related threads. All operations within the lock appear to execute atomically from the perspective of other threads holding the same lock.
Also, multiple context switches aren't required, as you postulated. Waiting threads are put in a (nominally FIFO) queue, and while they're triggered with Pulse(), they aren't fully runnable until the lock is relinquished (again, in part due to memory barriers).
For a good discussion of the issues, see: http://www.albahari.com/threading/part4.aspx#_Wait_and_Pulse
Your assumption that the Pulse() call invokes a thread switch is not correct. It merely moves a thread from the wait queue to the ready queue. The Exit() call makes the switch, to the thread that's first in the ready queue.