When logging when is an error fatal?

2019-03-09 17:00发布

问题:

In logging frameworks like log4j & log4net you have the ability to log various levels of information. Most of the levels have obvious intentions (such as what a "Debug" log is vs. a "Error"). However, one thing that I have always been timid on was classifying my logging as "Fatal".

What type of errors are so severe that they should be classified as fatal? While this is slightly case driven, what are some of the rules-of-thumb that you use when deciding between logging an exception as fatal or just simply error?

回答1:

I consider fatal errors to be when your application can't do any more useful work. Non-fatal errors are when there's a problem but your application can still continue to function, even at a reduced level of functionality or performance.

Examples of fatal errors include:

  • Running out of disk space on the logging device and you're required to keep logging.
  • Total loss of network connectivity in a client application.
  • Missing configuration information if no default can be used.

Non-fatal errors would include:

  • A server where a single session fails for some reason but you can still service other clients.
  • An intermittent error, such as lost session, if a new session can be established.
  • Missing configuration information if a default value can be used.


回答2:

An error is Fatal if something is missing or a situation occurs for which the application can simply not continue. Possible examples are a missing required config.file or when an exception 'bubbles up' and is caught by an unhandled exception handler



回答3:

I would use fatal if my next step is for the application to terminate, or merely not do any more subsequent work. If the application is part of a batch or there are multiple processes running, this can be useful for tracing what happened.

If there is a chance of recovery (e.g., loss of network connection with retries for a while) I would not use a fatal.

If I have multiple service threads activated by a main thread and one of them fails because of some bad input but the application can still serve new requests, I do not consider it fatal.



回答4:

To make this answer short and sweet, if your application crashes, I would consider that fatal. If you cannot connect to an important resource such as a database or a required service, that would be fatal. Overall, I would say that if it keeps your application from running correctly and affects the user, I would classify it as a fatal error.

But the most important way to classify errors is to consistently follow a rule of thumb such as rule 69 in C++ Coding Standards:

"Develop a practical, consistent, and rational error handling policy early in design, and then stick to it."