Regarding internal working of concurrent hashmap

2019-06-25 09:34发布

问题:

I was going through the ConcurrentHashMap and this related tutorial, and had some questions.

  1. In the article, it was mention that ConcurrentHashMap allows multiple readers to read concurrently without any blocking. This is achieved by partitioning the Map into different parts based on concurrency level and locking only a portion of the Map during updates. Default concurrency level is 16, and accordingly the Map is divided into 16 part and each part is governed with a different lock. This means, 16 threads can operate on Map simultaneously, until they are operating on different parts of the Map. This makes ConcurrentHashMap high performant despite keeping thread-safety intact. Though, it comes with a caveat: Since update operations like put(), remove(), putAll() or clear() are not synchronized, concurrent retrieval may not reflect the most recent change on the Map

  2. Another point also mentioned in the article: Another important point to remember is iteration over CHM, Iterator returned by keySet are weakly consistent and they only reflect state of ConcurrentHashMap at a certain point and may not reflect any recent change.

I have not understood the points highlighted in bold, could you provide more info or show me in a simple program?

回答1:

  1. Since update operations like put(), remove(), putAll() or clear() is not synchronized, concurrent retrieval may not reflect most recent change on Map

    As I understand it, this means that a modification of the map in one thread may not necessarily be seen by a retrieval happening at the same time in another thread. Consider the following example:

                      Thread 1 starts              Thread 1's call to get("a")
                     a call to get("a")             completes, returning null
                             |                                 |
    Thread 1        ---------+---------------------------------+-------
                                 time ----->
    Thread 2        -----+---------------------------+-----------------
                         |                           |
                 Thread 2 starts a            Thread 2's call to
                call to put("a", 1)          put("a", 1) completes
    

    Even though Thread 2 put a value in the map Thread 1's get completed execution, Thread 1 did not "see" the map modification, and returned null.

  2. Another important point to remember is iteration over CHM, Iterator returned by keySet of ConcurrentHashMap are weekly consistent and they only reflect state of ConcurrentHashMap and certain point and may not reflect any recent change.

    This is a similar situation. If Thread 1 obtains an Iterator from a ConcurrentHashMap's keySet, and later Thread 2 puts a new entry in the map, Thread 1's Iterator is not guaranteed to see that entry. (It may or it may not.)



回答2:

The real issue here is that when multiple threads are fooling with a data structure, the threads won't necessarily march in lock step.

One thread is reading for user1. One thread is writing for user2. Neither thread can predict where in their respective processes the other thread will be. Further, we can't predict for the users any sort of ordering on these two processes completion. If the write updates the data first, the read will show the updated state even though user1 might have requested a read a little earlier.

Reading or modifying when iterating work the same way with the additional consideration that the process of moving to the next one (when iterating) essentially becomes a "read" operation on the state of the Map, if not the content of any particular data in it.

So, when you allow concurrency in these data structures, you end up with a "close enough" test in terms of time. (It's a lot like the same considerations with databases except we are used to thinking of databases that way and the time frames are a couple of factors of 10 different.

NOTE: To make a comment about a wonderful little timeline shown by @Matts in another answer ...

The timeline shows two threads and a start and stop for each thread. The starts for the two threads can occur in either order (a,b) or (b,a). The ends can occur in either order because you can't tell how long the operations take. That gives 4 ways the two threads can start and finish. (a starts first and ends first, a starts first and b ends first, b starts first and a ends first, b starts first and b ends first) Now ... imagine 20 threads all doing the same thing in response to, say, 20 end users submitting requests for this and that. How many possible ways can it work.