How to detect and debug multi-threading problems?

2020-01-25 03:21发布

This is a follow up to this question, where I didn't get any input on this point. Here is the brief question:

Is it possible to detect and debug problems coming from multi-threaded code?

Often we have to tell our customers: "We can't reproduce the problem here, so we can't fix it. Please tell us the steps to reproduce the problem, then we'll fix it." It's a somehow nasty answer if I know that it is a multi-threading problem, but mostly I don't. How do I get to know that a problem is a multi-threading issue and how to debug it?

I'd like to know if there are any special logging frameworks, or debugging techniques, or code inspectors, or anything else to help solving such issues. General approaches are welcome. If any answer should be language related then keep it to .NET and Java.

17条回答
霸刀☆藐视天下
2楼-- · 2020-01-25 04:13

assert() is your friend for detecting race-conditions. Whenever you enter a critical section, assert that the invariant associated with it is true (that's what CS's are for). Though, unfortunately, the check might be expensive and thus not suitable for use in production environment.

查看更多
太酷不给撩
3楼-- · 2020-01-25 04:14

Threading/concurrency problems are notoriously difficult to replicate - which is one of the reasons why you should design to avoid or at least minimize the probabilities. This is the reason immutable objects are so valuable. Try to isolate mutable objects to a single thread, and then carefully control the exchange of mutable objects between threads. Attempt to program with a design of object hand-over, rather than "shared" objects. For the latter, use fully synchronized control objects (which are easier to reason about), and avoid having a synchronized object utilize other objects which must also be synchronized - that is, try to keep them self contained. Your best defense is a good design.

Deadlocks are the easiest to debug, if you can get a stack trace when deadlocked. Given the trace, most of which do deadlock detection, it's easy to pinpoint the reason and then reason about the code as to why and how to fix it. With deadlocks, it always going to be a problem acquiring the same locks in different orders.

Live locks are harder - being able to observe the system while in the error state is your best bet there.

Race conditions tend to be extremely difficult to replicate, and are even harder to identify from manual code review. With these, the path I usually take, besides extensive testing to replicate, is to reason about the possibilities, and try to log information to prove or disprove theories. If you have direct evidence of state corruption you may be able to reason about the possible causes based on the corruption.

The more complex the system, the harder it is to find concurrency errors, and to reason about it's behavior. Make use of tools like JVisualVM and remote connect profilers - they can be a life saver if you can connect to a system in an error state and inspect the threads and objects.

Also, beware the differences in possible behavior which are dependent on the number of CPU cores, pipelines, bus bandwidth, etc. Changes in hardware can affect your ability to replicate the problem. Some problems will only show on single-core CPU's others only on multi-cores.

One last thing, try to use concurrency objects distributed with the system libraries - e.g in Java java.util.concurrent is your friend. Writing your own concurrency control objects is hard and fraught with danger; leave it to the experts, if you have a choice.

查看更多
再贱就再见
4楼-- · 2020-01-25 04:16

Sometimes, multithreaded solutions cannot be avoided. If there is a bug,it needs to be investigated in real time, which is nearly impossible with most tools like Visual Studio. The only practical solution is to write traces, although the tracing itself should:

  1. not add any delay
  2. not use any locking
  3. be multithreading safe
  4. trace what happened in the correct sequence.

This sounds like an impossible task, but it can be easily achieved by writing the trace into memory. In C#, it would look something like this:

public const int MaxMessages = 0x100;
string[] messages = new string[MaxMessages];
int messagesIndex = -1;

public void Trace(string message) {
  int thisIndex = Interlocked.Increment(ref messagesIndex);
  messages[thisIndex] = message;
}

The method Trace() is multithreading safe, non blocking and can be called from any thread. On my PC, it takes about 2 microseconds to execute, which should be fast enough.

Add Trace() instructions wherever you think something might go wrong, let the program run, wait until the error happens, stop the trace and then investigate the trace for any errors.

A more detailed description for this approach which also collects thread and timing information, recycles the buffer and outputs the trace nicely you can find at: CodeProject: Debugging multithreaded code in real time 1

查看更多
兄弟一词,经得起流年.
5楼-- · 2020-01-25 04:16

I implemented the tool vmlens to detect race conditions in java programs during runtime. It implements an algorithm called eraser.

查看更多
闹够了就滚
6楼-- · 2020-01-25 04:19

I'm using GNU and use simple script

$ more gdb_tracer

b func.cpp:2871
r
#c
while (1)
next
#step
end
查看更多
登录 后发表回答