How do you reproduce bugs that occur sporadically?

2020-05-14 04:46发布

We have a bug in our application that does not occur every time and therefore we don't know its "logic". I don't even get it reproduced in 100 times today.

Disclaimer: This bug exists and I've seen it. It's not a pebkac or something similar.

What are common hints to reproduce this kind of bug?

28条回答
冷血范
2楼-- · 2020-05-14 05:19

"Heisenbugs" require great skills to diagnose, and if you want help from people here you have to describe this in much more detail, and patiently listen to various tests and checks, report result here, and iterate this till you solve it (or decide it is too expensive in terms of resources).

You will probably have to tell us your actual situation, language, DB, operative system, workload estimate, time of the day it happened in the past, and a myriad of other things, list tests you did already, how they went, and be ready to do more and share the results.

And this will not guarantee that we collectively can find it, either...

查看更多
Lonely孤独者°
3楼-- · 2020-05-14 05:20

Analyze the problem in a pair and pair-read the code. Make notes of the problems you KNOW to be true and try to assert which logical preconditions must hold true for this happen. Follow the evidence like a CSI.

Most people instinctively say "add more logging", and this may be a solution. But for a lot of problems this just makes things worse, since logging can change timing-dependencies sufficiently to make the problem more or less frequent. Changing the frequency from 1 in 1000 to 1 in 1,000,000 will not bring you closer to the true source of the problem.

So if your logical reasoning does not solve the problem, it'll probably give you a few specifics you could investigate with logging or assertions in your code.

查看更多
\"骚年 ilove
4楼-- · 2020-05-14 05:20

For .NET projects You can use Elmah (Error Logging Modules and Handlers) to monitor you application for un-caught exceptions, it's very simple to install and provides a very nice interface to browse unknown errors

http://code.google.com/p/elmah/

This saved me just today in catching a very random error that was occuring during a registration process

Other than that I can only recommend trying to get as much information from your users as possible and having a thorough understanding of the project workflow

They mostly come out at night.... mostly

查看更多
再贱就再见
5楼-- · 2020-05-14 05:21

Along with a lot of patience, a quiet prayer & cursing you would need:

  • a good mechanism for logging the user actions
  • a good mechanism for gathering the data state when the user performs some actions (state in application, database etc.)
  • Check the server environment (e.g. an anti-virus software running at a particular time etc.) & record the times of the error & see if you can find any trends
  • some more prayers & cursing...

HTH.

查看更多
该账号已被封号
6楼-- · 2020-05-14 05:22

Add verbose logging. It will take multiple -- sometimes dozen(s) -- iterations to add enough logging to understand the scenario. Now the problem is that if the problem is a race condition, which is likely if it doesn't reproduce reliably, so logging can change timing and the problem will stop happening. In this case do not log to a file, but keep a rotating buffer of the log in memory and only dump it on disk when you detect that the problem has occurred.

Edit: a little more thoughts: if this is a gui application run tests with a qa automation tool which allows you to replay macros. If this is a service-type app, try to come up with at least a guess as to what is happening and then programmatically create 'freak' usage patterns which would exercise the code that you suspect. Create higher than usual loads etc.

查看更多
我欲成王,谁敢阻挡
7楼-- · 2020-05-14 05:23

Add pre and post condition check in methods related to this bug.

You may have a look at Design by contract

查看更多
登录 后发表回答