What's the best way to implement a Hash that can be modified across multiple threads, but with the smallest number of locks. For the purposes of this question, you can assume that the Hash will be read-heavy. It must be thread-safe in all Ruby implementations, including ones that operate in a truly simultaneous fashion, such as JRuby, and it must be written in pure-Ruby (no C or Java allowed).
Feel free to submit a naïve solution that always locks, but that isn't likely to be the best solution. Points for elegance, but a smaller likelihood of locking wins over smaller code.
Since you mention the Hash would be read heavy, having one mutex locking both read and writes would result in race conditions that are most probably won by reads. If that's ok with you, then ignore the answer.
If you want to give writes a priority, an read-write lock would help. The following code is based on some old c++ assignment for Operating Systems class, so might not be best quality, but gives a general idea.
Then just wrap []= and [] in lock.write and lock.read. Might have a performance impact, but will guarantee that writes will 'get through' the reads. Usefulness of this depends on how read heavy it actually is.
Unfortunately I can't add a comment to Michael Sofaer answer where he introduces: class RWLock and class LockedHash with @reader_count etc. (don't have enough karma yet)
That solution does not work. It gives an error: in `unlock': Attempt to unlock a mutex which is not locked (ThreadError)
Due to the logical bug: when it's time to unlock things then unlock happens 1 extra time (because of missing check my_block?(). Instead it unblocks it even if unblocking was not necessary "is my block") and so 2nd unlock on already unlocked mutes raises an exception. (I'll paste full code on how to reproduce this error at the end of this post).
Also Michael mentioned "the each method is unsafe (allows mutations by other threads during an iteration)" which was critical for me, so I end up with this simplified solution which works for all my use cases and it simply locks mutex on any call to any hash method when called from different thread (calls from the same thread, which owns the lock are not blocking to avoid deadlocks):
And now the full example to demonstrate / reproduce the error of double unlocking in Michael Sofaer's solution:
, which gives the following error:
Okay, now that you specified the actually meaning of 'threadsafe', here are two potential implementations. The following code will run forever in MRI and JRuby. The lockless implementation follows an eventual consistency model where each thread uses it's own view of the hash if the master is in flux. There is a little trickery required to make sure storing all the information in the thread doesn't leak memory, but that is handled and tested ― process size does not grow running this code. Both implementations would need more work to be 'complete', meaning delete, update, etc. would need some thinking, but either of the two concepts below will meet your requirements.
It's very important for people reading this thread to realize the whole issue is exclusive to JRuby ― in MRI the built-in Hash is sufficient.
Posting base/naive solution, just to boost my Stack Overflow cred:
i'm pretty unclear on what is meant by this. i think the simplest implementation is simply
that is to say the built-in ruby hash is threadsafe if by threadsafe you mean will not blow up if > 1 threads tries to access it. this code will run safely forever
i suspect by thread safe you really mean ACID - for instance a write like hash[:key]=:val followed by a read if has[:key] would return :val. but no amount of trickery with locking can provide that - the last in would always win. for example, say you have 42 thread all updating a threadsafe hash - which value should be read by the 43'rd?? surely by threasafe you don't mean some sort of total ordering on writes - therefore if 42 threads were actively writing the 'correct' value is any right? but ruby's built-in Hash works in just this way...
perhaps you mean something like
in one thread and
would not interfere with one another? i can imagine wanting that to be threadsafe, but that's not even safe in a single thread with the MRI ruby (obviously you cannot modify a hash while iterating over it)
so can you be more specific about what you mean by 'threadsafe' ??
the only way to give ACID semantics would be a gross lock (sure this could be a method that took a block - but still an external lock).
ruby's thread scheduler isn't just going to schedule a thread smack in the middle of some arbitrary c function (like the built-in hash aref aset methods) so those are effectively threadsafe.
This might be a use case for the hamster gem
Hamster implements Hash Array Mapped Tries (HAMT), as well as some other persistent data structures, in pure Ruby.
Persistent data structures are immutable, and instead of mutating (changing) the structure, such as by adding or replacing a key-value pair in a Hash, you instead return a new data structure which contains the change. The trick, with the persistent immutable data structures, is that the newly returned data structure re-uses as much of the predecessor as possible.
I think to implement using hamster, you would use their mutable hash wrapper, which passes all reads to the current value of the persistent immutable hash (ie, should be fast), while guarding all writes with a mutex, and swapping to the new value of the persistent immutable hash after the write.
For example:
So, let's use this for a similar type of problem to the one described:
(gist here)
I'm getting the following outputs:
What do you think of this?
This solution is more similar to how one might solve this in Scala or Clojure, although in those languages one would more likely be using software transactional memory with low-level CPU support for the atomic compare and swap operations which are implemented.
Edit: It's worth noting that one reason the hamster implementation is fast is that it features a lock-free read path. Please reply in comments if you have questions about that or how it works.