Most advice concerning error handling boils down to a handful of tips and tricks (see this post for example). These hints are helpful but I think they don't answer all questions. I feel that I should design my application according to a certain philosophy, a school of thought that provides a strong foundation to build upon. Is there such a theory on the topic of error handling?
Here's a few practical questions:
- How to decide if an error should be handled locally or propagated to higher level code?
- How to decide between logging an error, or showing it as an error message to the user?
- Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
- In case of exceptions, where should you generally catch them? In low-level or higher level code?
- Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
- Does it make sense to create a list of error codes? Or is that old fashioned these days?
In many cases common sense is sufficient for developing a good-enough strategy to deal with error conditions. However, I would like to know if there is a more formal/"scholarly" approach?
PS: this is a general question, but C++ specific answers are welcome too (C++ is my main programming language for work).
Here is an awesome blog post which explains how error handling should be done. http://damienkatz.net/2006/04/error_code_vs_e.html
How to decide if an error should be handled locally or propagated to higher level code? Like Martin Becket says in another answer, this is a question of whether the error can be fixed here or not.
How to decide between logging an error, or showing it as an error message to the user? You should probably never show an error to the user if you think so. Rather, show them a well formed message explaining the situation, without giving too much technical information. Then log the technical information, especially if it is an error while processing input. If your code doesn't know how to handle faulty input, then that MUST be fixed.
Is logging something that should only be done in application code? Or is it ok to do some logging from library code. Logging in library code is not useful, because you may not even have written it. However, the application could log interaction with the library code and even through statistics detect errors.
In case of exceptions, where should you generally catch them? In low-level or higher level code? See question one.
Similar question: at what point should you stop propagating an error and deal with it? See question one.
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries). Throwing exceptions is an expensive operation in most heavy languages, so use them where the entire program flow is broken for that operation. On the other hand, if you can predict all outcomes of a function, put any data through a referenced variable passed as parameter to it, and return an error code (0 on success, 1+ on errors).
Does it make sense to create a list of error codes? Or is that old fashioned these days? Make a list of error codes for a particular function, and document it inside it as a list of possible return values. See previous question as well as the link.
My view on logging (or other actions) from library code is NEVER.
A library should not impose policy on its user, and the user may have INTENDED an error to occur. Perhaps the program was deliberately soliciting a particular error, in the expectation of it arriving, to test some condition. Logging this error would be misleading.
Logging (or anything else) imposes policy on the caller, which is bad. Moreover, if a harmless error condition (which would be ignored or retried harmlessly by the caller, for example) were to happen with a high frequency, the volume of logs could mask any legitimate errors or cause robustness problems (filling discs, using excessive IO etc)
Always handle as soon as possible. The closer you are to its occurrence the more chance you have to do something meaningful or at the least figure out where and why it happened. In C++, it is not just a matter of context but being impossible to determine in many cases.
In general you should always halt the app if something buggy occurs that is a real error (not something like not finding a file, which is not really something that should count as an error but is labeled as such). It's not going to just sort itself out, and once the app is broken it will cause errors that are impossible to debug because they have nothing to do with the area they occur.
Why not?
see 1.
see 1.
You need to keep things simple, or you will regret it. More important to handling bugs at runtime is testing to avoid them.
It's like saying is it better to centralize or not centralize. It might make a lot of sense in some cases but be a waste of time in others. For something that is a loadable lib/module of some kind that can have errors that are data related (garbage in, garbage out), it makes tons of sense. For more general error handling or catastrophic errors, less.
The first question is probably what can you do about the error?
Can you fix it (in which case do you need to tell the user) or can the user fix it?
If nobody can fix it and you are going to exit, is there any value in having this reported back to you (through a crash dump or error code)?
A couple of years ago I thought exactly about the same question :)
After searching and reading several things, I think that the most interesting reference I found was Patterns for Generation, Handling and Management of Errors from Andy Longshaw and Eoin Woods. It is a short and systematic attempt to cover the basic idioms you mention and some others.
The answer to these questions is quite controversial, but the authors above were brave enough to expose themselves in a conference, and then put their thoughts on paper.
How to decide if an error should be handled locally or propagated to higher level code?
Error handling should be done at the highest affected level. If it only impacts the lower level code, then it should be handled there. If the error affects higher level code, then the error needs to be handled at the higher level. This is to prevent some higher level code from going on its merry way after an error has caused its actions to be incorrect. It should know what is going on, provided it is impacted.
How to decide between logging an error, or showing it as an error message to the user?
You should always log the error. You should show the error to the user when they are affected by it. If it is something they will never notice and does not have a direct impact (e.g. two sockets failed to open before the third finally opened, resulting in a very short delay for the user should not be reported), then they should not be notified.
Is logging something that should only be done in application code? Or is it ok to do some logging from library code.
Too much logging is rarely a bad thing. You will regret not logging things when you have to hunt down a library bug more than you will be frustrated by extra logs when hunting down other bugs.
In case of exceptions, where should you generally catch them? In low-level or higher level code?
Similar to error handling above, it should be caught where the impact is, and where the error can be corrected/handled effectively. This will vary from case to case.
Should you strive for a unified error handling strategy through all layers of code, or try to develop a system that can adapt itself to a variety of error handling strategies (in order to be able to deal with errors from 3rd party libraries).
This is largely a personal decision. My internal error handling is much different than the error handling I use for anything that touches a third party library. I have a general idea of what to expect from my code, but the third party stuff could have anything happen to it.
Does it make sense to create a list of error codes? Or is that old fashioned these days? Depends how much you expect to have errors thrown. You might love your list of error codes if you spend a lot of time bug hunting, as they can help point you in the right direction. However, any time spent building these is less time spent on coding/bug fixing, so its a mixed bag. This largely comes down to personal preference.