How do you reproduce bugs that occur sporadically?

2020-05-14 04:46发布

We have a bug in our application that does not occur every time and therefore we don't know its "logic". I don't even get it reproduced in 100 times today.

Disclaimer: This bug exists and I've seen it. It's not a pebkac or something similar.

What are common hints to reproduce this kind of bug?

28条回答
霸刀☆藐视天下
2楼-- · 2020-05-14 05:28

I'd suggest to write down all things that user has been doing. If you have lets say 10 such bug reports You can try to find something that connects them.

查看更多
甜甜的少女心
3楼-- · 2020-05-14 05:28

The team that I work with has enlisted the users in recording their time they spend in our app with CamStudio when we've got a pesky bug to track down. It's easy to install and for them to use, and makes reproducing those nagging bugs much easier, since you can watch what the users are doing. It also has no relationship to the language you're working in, since it's just recording the windows desktop.

However, this route seems to be viable only if you're developing corporate apps and have good relationships with your users.

查看更多
一纸荒年 Trace。
4楼-- · 2020-05-14 05:31

Try to add code in your app to trace the bug automatically once it happens (or even alert you via mail / SMS)

log whatever you can so when it happens you can catch the right system state.

Another thing- try applying automated testing that can cover more territory than human based testing in a formed manner.. it's a long shot, but a good practice in general.

查看更多
We Are One
5楼-- · 2020-05-14 05:32

What development environment? For C++, your best bet may be VMWare Workstation record/replay, see: http://stackframe.blogspot.com/2007/04/workstation-60-and-death-of.html

Other suggestions include inspecting the stack trace, and careful code overview... there is really no silver bullet :)

查看更多
▲ chillily
6楼-- · 2020-05-14 05:32

Assuming you're on Windows, and your "bug" is a crash or some sort of corruption in unmanaged code (C/C++), then take a look at Application Verifier from Microsoft. The tool has a number of stops that can be enabled to verify things during runtime. If you have an idea of the scenario where your bug occurs, then try to run through the scenario (or a stress version of the scenario) with AppVerifer running. Make sure to either turn on pageheap in AppVerifier, or consider compiling your code with the /RTCcsu switch (see http://msdn.microsoft.com/en-us/library/8wtf2dfz.aspx for more information).

查看更多
冷血范
7楼-- · 2020-05-14 05:33

This varies (as you say), but some of the things that are handy with this can be

  • immediately going into the debugger when the problem occurs and dumping all the threads (or the equivalent, such as dumping the core immediately or whatever.)
  • running with logging turned on but otherwise entirely in release/production mode. (This is possible in some random environments like c and rails but not many others.)
  • do stuff to make the edge conditions on the machine worse... force low memory / high load / more threads / serving more requests
  • Making sure that you're actually listening to what the users encountering the problem are actually saying. Making sure that they're actually explaining the relevant details. This seems to be the one that breaks people in the field a lot. Trying to reproduce the wrong problem is boring.
  • Get used to reading assembly that was produced by optimizing compilers. This seems to stop people sometimes, and it isn't applicable to all languages/platforms, but it can help
  • Be prepared to accept that it is your (the developer's) fault. Don't get into the trap of insisting the code is perfect.
  • sometimes you need to actually track the problem down on the machine it is happening on.
查看更多
登录 后发表回答