Ways to Find a Race Condition

2020-06-01 01:08发布

I have a bit of code with a race condition in it... I know that it is a race condition because it does not happen consistently, and it seems to happen more often on dual core machines.

It never happens when I'm tracing. Although, there is a possibility that it could be a deadlock as well. By analyzing stages of completion of logs where this does and does not occur, I've been able to pinpoint this bug to a single function. However, I do not know where in the scope of the function this is happening. It's not at the top level.

Adding log statements or breakpoints is going to change the timing if it is a race condition, and prevent this from happening.

Is there any technique that I can use aside from getting a race condition analyzer that will allow me to pinpoint where this is happening?

This is in visual studio 9, with C++ (of the nonmanaged variety).

8条回答
男人必须洒脱
2楼-- · 2020-06-01 01:59

Indeed there are some attempts to find race conditions automatically.

Another term I read in conjunction with race condition detection is RaceFuzzer, but I was not able to find really useful information about it.

I think this is a relatively yound field of investigation so there are - as far as i know - mainly theoretic papers about this subject. However, try googling one the above keywords, maybe you will find some useful information.

查看更多
地球回转人心会变
3楼-- · 2020-06-01 02:02

So, the sledgehammer method for me has been the following, which takes a lot of patience and can in the best case scenario get you on the right track. I used this to figure out what was going on with this particular problem. I have been using tracepoints, one at the beginning of the suspected high-level function, and one at the end. Move the tracepoint down. If adding the tracepoint at the beginning of the function causes your bug to stop happening, move the tracepoint down until you can reproduce the condition again. The idea is that the tracepoint will not affect timing if you place it after the call that eventually triggers unsafe code, but will if you place it before. Also, note your output window. Between what messages is your bug occuring? You can use tracepoints to narrow this range as well.

Once you narrow your bug down to a manageable region of code, you can throw in breakpoints and have a look at what the other threads are up to at this point.

查看更多
登录 后发表回答