Is there an existing solution for these particular

2020-08-22 17:41发布

问题:

I've had the need for a multi-threaded data structure that supports these claims:

  • Allows multiple concurrent readers and writers
  • Is sorted
  • Is easy to reason about

Fulfilling multiple readers and one writer is a lot easier, but I really would wan't to allow multiple writers.

I've been doing research into this area, and I'm aware of ConcurrentSkipList (by Lea based on work by Fraser and Harris) as it's implemented in Java SE 6. I've also implemented my own version of a concurrent Skip List based on A Provably Correct Scalable Concurrent Skip List by Herlihy, Lev, Luchangco and Shavit.

These two implementations are developed by people that are light years smarter then me, but I still (somewhat ashamed, because it is amazing work) have to ask the question if these are the two only viable implementations of a concurrent multi reader/writer data structures available today?

回答1:

Sounds to me like your making this problem too hard for yourself. Consider the following:

  • Its pretty easy to implement immutable versions of many data structures, especially trees. Immutable data structures have the benefit that, by virtue of being immutable, one thread can't modify the collection under another threads nose. Immutability = no race conditions = no locks = no deadlocking. Awesomeness.

    See Okasaki's Purely Functional Data Structures, which provides ML and Haskell implementations of heaps, balanced trees, stacks, queues, and some other data structures.

  • Threads can't see changes to an immutable data structure made in other threads. They can, however, notify one another explicitly of changes using message-passing concurrency.

Locks and mutexes are too low-level, and mutable state is pretty much the enemy of multithreaded programming. If you think about whatever problem your trying to solve in terms immutability and message passing, then it'll become 1000x easier for you.



回答2:

Well, you already identified the one I would typically suggest - the concurrent skiplist, but the absence of other specific requirements other than the three above, I think a simple linked list with per-node mutexes will work:

Each node contains one element (or a reference), and one simple mutex. I'll assume Java here, because it helps avoid race conditions around node reclamation.

Searching through the list consists of iterating through the nodes from the head, no need to get any locks (although you'll need to ensure visibility between threads at some point - you can choose how frequently this happens - once per search may be good enough).

To add an item, do a search until you find the immediate predecessor and successor nodes for the value you want to insert, lock the mutex associated with the prior node, check the successor node again (it might have changed out from under you), then splice in the new node.

Deletion works similarly - find the predecessor node for the node you want to delete, lock the predecessor node, check that it is still the predecessor node, and take it out of the tree.

This structure is sorted (to the extent that it is correct - I haven't proven that!).

This structure clearly allows multiple readers (readers are never blocked for any reason), and multiple writers, although writers trying to manipulate the same portion of the list (e.g., two threads inserting nodes which have the same splice point) will wait on each other.

The structure seems relatively easy to reason about - being based on a single linked list, with a fairly simple locking structure and with a few simple invariants. I haven't spend more than a few minutes reasoning about its correctness, however. You can make it even simpler to reason about, at the cost of performance, by making the locking policy more heavyweight - for insertion and deletion lock each node during iteration and then lock the successor before unlocking the prior node - in this way you'll have both nodes locked when you find the splice or deletion point, so no "double-check, backtrack" is needed.

You may be able to get rid of the locks completely and use a lock-free list while maintaining your "must be sorted" condition, but I haven't thought about it in depth - at a minimum I suspect it will be "harder to reason about".

In C++ the structure is more involved, since you can't rely on the GC to keep the nodes around as long as readers might be looking at them, so the simple strategy of allowing the readers to mosey around in a lock-free manner doesn't fly, if you ever want to delete nodes. You can adjust it by making readers take the lock of each visited node, but this sucks for both performance (obvious) and concurrency (because although you can have multiple readers and writers in some basic way, they can never pass each other anymore, so practical concurrency is heavily limited).

An alternative would be to use rwlocks in each node, readers taking read locks only and inserters/deleters taking read locks to find the position to work on, then upgrading to write. Currency is at least improved since readers can pass readers, but writers still block readers (so all readers that are iterating at a position before some node on which a write operation starts will not be able to iterate past that node until the operation completes).

Now, all that said (whew!) I should mention that other that "easy to reason about" this structure seems pretty much inferior to the concurrent skip list in every material way, with the possible exception of slightly less memory usage (maybe). It doesn't have the log(N) search behavior, in particular. It is not fully lock-free (writers may wait on writers in some cases). I'm not even sure it is that much easier to reason about as far as concurrency goes, although the underlying structure is simple.

I think if you really wanted you could extend this kind of "per-node" locking to something like an rb-tree if you wanted, but it not easy. In particular, you need to find some kind of node to lock prior to any tree rotations that is "high enough" that you can be sure that any thread wanting to perform a modification that will affect the correctness of your rotation will also try to lock the same node. In the linked list, this is the "predecessor node" - in an AVL or RB-tree it is not so simple.



回答3:

I created a lock-free data structure in F# with similar requirements for my work recently. Specifically, it was a sorted dictionary that mapped int keys to int values where the values were counters and the two primitive operations were incrementing the count associated with a given key and obtaining an array of the current key-value pairs.

I implemented this in F# as a value of the type Map<int, int ref> ref which is a mutable reference to an immutable map from int keys to mutable references to int values. Readers concurrently read the reference to obtain the current map, lookup the key in it and dereference to obtain the associated int value. Writers concurrently read the reference, lookup the key and atomically increment it if it exists or create a new map with the new key-value pair and use CAS to replace the root reference to the map.

This design relies upon the ability to read and write references atomically (which .NET guarantees) but it is only efficient if updates to the Map are rare. This is so in my case because most writes increment counters that already exist and, in the steady state, the creation of new counters is rare.