linux thread synchronization

I am new to linux and linux threads. I have spent some time googling to try to understand the differences between all the functions available for thread synchronization. I still have some questions.

I have found all of these different types of synchronizations, each with a number of functions for locking, unlocking, testing the lock, etc.

gcc atomic operations
futexes
mutexes
spinlocks
seqlocks
rculocks
conditions
semaphores

My current (but probably flawed) understanding is this:

semaphores are process wide, involve the filesystem (virtually I assume), and are probably the slowest.

Futexes might be the base locking mechanism used by mutexes, spinlocks, seqlocks, and rculocks. Futexes might be faster than the locking mechanisms that are based on them.

Spinlocks dont block and thus avoid context swtiches. However they avoid the context switch at the expense of consuming all the cycles on a CPU until the lock is released (spinning). They should only should be used on multi processor systems for obvious reasons. Never sleep in a spinlock.

The seq lock just tells you when you finished your work if a writer changed the data the work was based on. You have to go back and repeat the work in this case.

Atomic operations are the fastest synch call, and probably are used in all the above locking mechanisms. You do not want to use atomic operations on all the fields in your shared data. You want to use a lock (mutex, futex, spin, seq, rcu) or a single atomic opertation on a lock flag when you are accessing multiple data fields.

My questions go like this:

Am I right so far with my assumptions?
Does anyone know the cpu cycle cost of the various options? I am adding parallelism to the app so we can get better wall time response at the expense of running fewer app instances per box. Performances is the utmost consideration. I don't want to consume cpu with context switching, spinning, or lots of extra cpu cycles to read and write shared memory. I am absolutely concerned with number of cpu cycles consumed.
Which (if any) of the locks prevent interruption of a thread by the scheduler or interrupt...or am I just an idiot and all synchonization mechanisms do this. What kinds of interruption are prevented? Can I block all threads or threads just on the locking thread's CPU? This question stems from my fear of interrupting a thread holding a lock for a very commonly used function. I expect that the scheduler might schedule any number of other workers who will likely run into this function and then block because it was locked. A lot of context switching would be wasted until the thread with the lock gets rescheduled and finishes. I can re-write this function to minimize lock time, but still it is so commonly called I would like to use a lock that prevents interruption...across all processors.
I am writing user code...so I get software interrupts, not hardware ones...right? I should stay away from any functions (spin/seq locks) that have the word "irq" in them.
Which locks are for writing kernel or driver code and which are meant for user mode?
Does anyone think using an atomic operation to have multiple threads move through a linked list is nuts? I am thinking to atomicly change the current item pointer to the next item in the list. If the attempt works, then the thread can safely use the data the current item pointed to before it was moved. Other threads would now be moved along the list.
futexes? Any reason to use them instead of mutexes?
Is there a better way than using a condition to sleep a thread when there is no work?
When using gcc atomic ops, specifically the test_and_set, can I get a performance increase by doing a non atomic test first and then using test_and_set to confirm? I know this will be case specific, so here is the case. There is a large collection of work items, say thousands. Each work item has a flag that is initialized to 0. When a thread has exclusive access to the work item, the flag will be one. There will be lots of worker threads. Any time a thread is looking for work, they can non atomicly test for 1. If they read a 1, we know for certain that the work is unavailable. If they read a zero, they need to perform the atomic test_and_set to confirm. So if the atomic test_and_set is 500 cpu cycles because it is disabling pipelining, causes cpu's to communicate and L2 caches to flush/fill .... and a simple test is 1 cycle .... then as long as I had a better ratio of 500 to 1 when it came to stumbling upon already completed work items....this would be a win.

I hope to use mutexes or spinlocks to sparilngly protect sections of code that I want only one thread on the SYSTEM (not jsut the CPU) to access at a time. I hope to sparingly use gcc atomic ops to select work and minimize use of mutexes and spinlocks. For instance: a flag in a work item can be checked to see if a thread has worked it (0=no, 1=yes or in progress). A simple test_and_set tells the thread if it has work or needs to move on. I hope to use conditions to wake up threads when there is work.

Thanks!

回答1:

Application code should probably use posix thread functions. I assume you have man pages so type

man pthread_mutex_init
man pthread_rwlock_init
man pthread_spin_init

Read up on them and the functions that operate on them to figure out what you need.

If you're doing kernel mode programming then it's a different story. You'll need to have a feel for what you are doing, how long it takes, and what context it gets called in to have any idea what you need to use.

回答2:

Thanks to all who answered. We resorted to using gcc atomic operations to synchronize all of our threads. The atomic ops were about 2x slower than setting a value without synchronization, but magnitudes faster than locking a mutex, changeing the value, and then unlocking the mutex (this becomes super slow when you start having threads bang into the locks...) We only use pthread_create, attr, cancel, and kill. We use pthread_kill to signal threads to wake up that we put to sleep. This method is 40x faster than cond_wait. So basicly....use pthreads_mutexes if you have time to waste.

回答3:

in addtion you should check the nexts books

Pthreads Programming: A POSIX Standard for Better Multiprocessing

and

Programming with POSIX(R) Threads

回答4:

regarding question # 8 Is there a better way than using a condition to sleep a thread when there is no work? yes i think that the best aproach instead of using sleep is using function like sem_post() and sem_wait of "semaphore.h"

regards

回答5:

A note on futexes - they are more descriptively called fast userspace mutexes. With a futex, the kernel is involved only when arbitration is required, which is what provides the speed up and savings.

Implementing a futex can be extremely tricky (PDF), debugging them can lead to madness. Unless you really, really, really need the speed, its usually best to use the pthread mutex implementation.

Synchronization is never exactly easy, but trying to implement your own in userspace makes it inordinately difficult.