How do you reproduce bugs that occur sporadically?

2020-05-14 04:46发布

We have a bug in our application that does not occur every time and therefore we don't know its "logic". I don't even get it reproduced in 100 times today.

Disclaimer: This bug exists and I've seen it. It's not a pebkac or something similar.

What are common hints to reproduce this kind of bug?

28条回答
太酷不给撩
2楼-- · 2020-05-14 05:35

Read the stack trace carefully and try to guess what could be happened; then try to trace\log every line of code that potentially can cause trouble.

Keep your focus on disposing resources; many sneaky sporadical bugs i found were related to close\dispose things :).

查看更多
爷的心禁止访问
3楼-- · 2020-05-14 05:39

Let’s say I’m starting with a production application.

  1. I typically add debug logging around the areas where I think the bug is occurring. I setup the logging statements to give me insight into the state of the application. Then I have the debug log level turned on and ask the user/operator(s) notify me of the time of the next bug occurrence. I then analyze the log to see what hints it gives about the state of the application and if that leads to a better understanding of what could be going wrong.

  2. I repeat step 1 until I have a good idea of where I can start debugging the code in the debugger

  3. Sometimes the number of iterations of the code running is key but other times it maybe the interaction of a component with an outside system (database, specific user machine, operating system, etc.). Take some time to setup a debug environment that matches the production environment as closely as possible. VM technology is a good tool for solving this problem.

  4. Next I proceed via the debugger. This could include creating a test harness of some sort that puts the code/components in the state I’ve observed from the logs. Knowing how to setup conditional break points can save a lot of time, so get familiar with that and other features within your debugger.

  5. Debug, debug , debug. If you’re going nowhere after a few hours, take a break and work on something unrelated for awhile. Come back with a fresh mind and perspective.

  6. If you have gotten nowhere by now, go back to step 1 and make another iteration.

  7. For really difficult problems you may have to resort to installing a debugger on the system where the bug is occurring. That combined with your test harness from step 4 can usually crack the really baffling issues.

查看更多
聊天终结者
4楼-- · 2020-05-14 05:40

Unit Tests. Testing a bug in the app is often horrendous because there is so much noise, so many variable factors. In general the bigger the (hay)stack, the harder it is to pinpoint the issue. Creatively extending your unit test framework to embrace edge cases can save hours or even days of sifting

Having said that there is no silver bullet. I feel your pain.

查看更多
狗以群分
5楼-- · 2020-05-14 05:40

Use an enhanced crash reporter. In the Delphi environment, we have EurekaLog and MadExcept. Other tools exist in other environments. Or you can diagnose the core dump. You're looking for the stack trace, which will show you where it's blowing up, how it got there, what's in memory, etc.. It's also useful to have a screenshot of the app, if it's a user-interaction thing. And info about the machine that it crashed on (OS version and patch, what else is running at the time, etc..) Both of the tools that I mentioned can do this.

If it's something that happens with a few users but you can't reproduce it, and they can, go sit with them and watch. If it's not apparent, switch seats - you "drive", and they tell you what to do. You'll uncover the subtle usability issues that way. double-clicks on a single-click button, for example, initiating re-entrancy in the OnClick event. That sort of thing. If the users are remote, use WebEx, Wink, etc., to record them crashing it, so you can analyze the playback.

查看更多
登录 后发表回答