How does concurrency using immutable/persistent ty

2019-01-31 09:19发布

I've read a lot about functional languages recently. Since those use only immutable structures they claim that concurrency issues are greatly improved/solved. I'm having some serious trouble understanding how this can actually be delivered in a real-life context. Let's assume we have a web server with one thread listening on a port (well, IO is another thing i have difficulty wrapping my head around but let's just ignore that for now); On any connection attempt a socket is created and passed to a newly created thread which does some work with it and, depending on the received communications, may apply changes to a big list/data structure which is global to the server application. So, how does this list access work then in order for all threads having a consistent view of the list (or at least in order to have all changes made by one thread applied to the list as soon as the thread dies in a correct manner)?

My problems understanding are:

  • Obviously any thread can get a unchangeable "snapshot" of the list to work on. However, after "changing" the contents by creating a new version of the list with the changes applied, we are still left with every thread having their own version of the list. How are those merged back together?
  • Another method might consist of using traditional locking mechanisms like mutex/cond or go-like-channels. However, how would you even create such a thing when all variables are immutable?
  • I've heard about STM, however that cannot deal with side effects (i.e. if the list would also transparently backup the data to a file or db)

So how would you model such a thing in a functional language?

1条回答
Animai°情兽
2楼-- · 2019-01-31 09:58

Immutable values have several applications for which they are well suited. The concurrent/parallel processing is just one of them that got more important recently. The following is really the most basic digest from experience and many books and talks about the subject. You may need to dive into some eventually.

The main example you show here is about managing global state, so it cannot be done purely "immutably". However, even here there are very good reasons to use immutable data structures. Some of those from top of my head:

  • try - catch behaves much better, because you do not modify shared object that may be left modified half-way, with immutable values, it automatically keeps the last consistent state
  • reducing changing state to just multicore safe "compare-and-swap" operations on very limited set of global variables (ideally one), completely eliminates deadlocks
  • free passing of data structures without any defensive copying that is quite often case of mysterious bugs when forgotten (many times defensive copies are created in both calling and called functions, because developers start to incline to "better safe than sorry" after a couple of debugging sessions)
  • much easier unit testing, because many functions operating on immutable values are side-effect free
  • usually easier serialization and more transparent comparison semantics much easier debugging and taking (logging) a current snapshot of the system state even asynchronously

Back to your question though.

In the most trivial case the global state in this case is often modelled using one mutable reference at the top holding onto an immutable data structure.

The reference is updated only by CAS atomic operation.

The immutable data structure is transformed by a side-effect free functions and when all transformations are done the reference is swapped atomically.

If two threads/cores want to swap simultaneously new values got from the same old one, the one doing that first wins the other does not succeed (CAS semantics) and needs to repeat the operation (depends on the transformation, either updating the current one with the new value, or transforming the new value from the beginning). This may seem wasteful, but the assumption here is that redoing some work is often cheaper than permanent locking/synchronization overhead.

Of course this can be optimized e.g. by partitioning independent parts of immutable data structures to further reduce potential collisions by having several references being updated independently.

Access to the data structure is lock-free and very fast and always gives a consistent response. Edge cases like when you send an update and another client receives older data afterwards is to be expected in any system, because of network requests can get out of order too...

STM is useful quite rarely and usually you are better of to use atomic swaps of data structure containing all values from references you would use in STM transaction.

查看更多
登录 后发表回答