可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

Here is my problem: I am trying to debug Apache Cassandra and understand the flow of the app. I.e. when a request is sent by the client, say put(), what methods are called and how the system is working internally.

So, here is what I am thinking:

Write a main method in the cassandra code which calls the point of entry put() method, put breakpoints in eclipse etc etc OR
Don't write a main method, simply use regular client (which accesses server via TCP) and "debug" (by reading the log files and understanding the code) using log4j loggers (already implemented in cassandra).

So, my question is, what is the ideal way of debugging such a distributed application?

回答1:

Ideal way? Both, and more.

You mentioned objectives: "debug" and "understand the flow of the application" - OK it's very hard to debug before you do understand the flow, but understanding may be an end in itself.

In the real world, when dealing with large distributed systems on often cannot rely on debuggers, at least initially, not least because some problems only show up when the system is busy or after hours of running. Hence good debug trace, and fine-grained control over that trace, in the application code and infrastructure code is essential.

However if you have the opportunity to run in a debugger that can be quite illuminating.

Before all of that I think you need to:

a). Study any design documentation that there may be.

b). Browse the source code in a good IDE, eg. Eclipse. Just follow the control. Hmmm here's an interesting bit, wonder where it gets called from? Call to that method on a class, what does that do? When does that constructor get called?

With some of that in your head followng the trace is much easier, and you have a better idea where to put the breakpoints.

回答2:

How about using log4j's MDC, setting it right before put() and then clearing it after put() has exited? Then you can see what really happened in there, providing you have other logging set up in methods that are executed inside put(). If you are somewhere deep in that method, log the stack trace now and then, so you can see where you're currently.

Disclaimer: My debugging priority list goes like this:

examine stack trace
examine log files
use a debugger

So, if 1. and 2. don't give me an answer, I will resort to a debugger.

In a distributed app like this, using a debugger sounds like a last resort thing.

回答3:

Using logging in distributed application is indeed one of the best ways to figure out what actually happens on a wider scale and how things interact. But you will eventually face a problem with log files - distributed systems can generate lots of them, in various formats and locations. So if you want to use log4j (or alike) for stuff like this, you should aggregate logs into one place and then study them. This tool might help, - it allows not only persisted aggregation, but also real-time monitoring of aggregated log stream from various sources. For example, you can focus on data layer from particular host (or range of hosts) and observe in real-time what's going on. Alternatively you can fetch logs from particular thread on a particular machine or use MDC context like mentioned already by previous poster. I am also subscribing to the view that debugger in distributed apps is useless most of the time and is totally useless in production systems for obvious reasons. Log4j on the other hand is incredibly flexible, used widely and is one of the best tools (IMHO) for logging.

回答4:

Use logs, increase log levels if required add more log statements. at different components of the distributed system, profile different components like database, application server, analyze stack trace, use debugging tools on front-end browser built-in if web app and as well as back-end breakpoints