What could solves this multi-threaded scenario bet

2019-07-07 11:00发布

I have a persistent B+tree, multiple threads are reading different chunks of the tree and performing some operations on read data. Interesting part: each thread produces a set of results, and as end user I want to see all the results in one place. What I do: one ConcurentDictionary and all threads are writing to it.

Everything works smooth this way. But the application is time critical, one extra second means a total dissatisfaction. ConcurentDictionary because of the thread-safety overhead is intrinsically slow compared to Dictionary.

I can use Dictionary, then each thread will write results to distinct dictionaries. But then I'll have the problem of merging different dictionaries.

.

My Questions:

  1. Are concurrent collections a good decision for my scenario ?
  2. If Not(1), then how would I merge optimally different dictionaries. Given that, (a) copying items one-by-one and (b) LINQ are known solutions and are not as optimal as expected :)
  3. If Not(2) ;-) What would you suggest instead ?

.

A quick info:

  • #Thread = processorCount. The application can run on a standard laptop (i.e., 4 threads) or high-end server (i.e., <32 threads)
  • Item Count. The tree usually holds more than 1.0E+12 items.

2条回答
叛逆
2楼-- · 2019-07-07 11:25

From your timings it seems that the locking/building of the result dictionary is taking 3700ms per thread with the actual processing logic taking just 300ms.

I suggest that as an experiment you let each thread create its own local dictionary of results. Then you can see how much time is spent building the dictionary compared to how much is the effect of locking across threads.

If building the local dictionary adds more than 300ms then it will not be possible to meet your time limit. Because without any locking or any attempt to merge the results it has already taken too long.

Update

It seems that you can either pay the merge price as you go along, with the locking causing the threads to sit idle for a significant percentage of time, or pay the price in a post-processing merge. But the core problem is that the locking means you are not fully utilising the available CPU.

The only real solution to getting maximum performance from your cores is it use a non-blocking dictionary implementation that is also thread safe. I could not find a .NET implementation but did find a research paper detailing an algorithm that would indicate it is possible.

Implementing such an algorithm correctly is not trivial but would be fun!

Scalable and Lock-Free Concurrent Dictionaries

查看更多
SAY GOODBYE
3楼-- · 2019-07-07 11:34

Had you considered async persistence?

Is it allowed in your scenario?

You can bypass to a queue in a separated thread pool (creating a thread pool would avoid the overhead of creating a (sub)thread for each request), and there you can handle the merging logic without affecting response time.

查看更多
登录 后发表回答