I'm running into an issue where I'm processing unicode strings and I want to do some error reporting with standard exceptions. The error messages contained in standard exceptions are not unicode.
Usually that hasn't been a problem for me because I can define the error message in non-unicode and have enough information, but in this case I want to include data from the original strings, and these can be unicode.
How do you handle unicode messages in your exceptions? Do you create your own custom exception class, do you derive from the standard exceptions extending them to unicode, or do you have even other solutions to this problem (such as a rule "don't use unicode in exceptions")?
I think Peter Dimov's rationale as pointed out in the Boost error handling guidelines covers this well:
Don't worry too much about the what()
message. It's nice to have a message
that a programmer stands a chance of
figuring out, but you're very unlikely
to be able to compose a relevant and
user-comprehensible error message at
the point an exception is thrown.
Certainly, internationalization is
beyond the scope of the exception
class author. Peter Dimov makes an
excellent argument that the proper use
of a what() string is to serve as a
key into a table of error message
formatters. Now if only we could get
standardized what() strings for
exceptions thrown by the standard
library...
(I'm adding an answer to my own question after an insight because of Flodin's answer)
In my particular case I have a string which may contain unicode characters, which I am parsing and thus expecting to be in a certain format. The parsing may fail and throw an exception to indicate that a problem occurred.
Originally I intended to create a programmer-readable message inside the exception that details the contents of the string where parsing failed, and that's where I ran into trouble because the exception message of a standard exception cannot contain unicode characters.
However, the new design I am considering is to return the location of the parsing error in the string through the exception mechanism within a std::exception-derived class. The process of creating a programmer-readable message that contains the parts of the string causing the error can be delegated to a handler outside the class. This feels like a much cleaner design to me.
Thank you for the input, everyone!
If you really want Unicode you could UTF-8 encode the exception message, throw in a BOM in the beginning so you can tell if the exception message is UTF-8, raw char
, or other encoding when you prepare the message for output.
We use our own exception class. If that's not possible you can always translate from Unicode to MBSC represented in the current charset – you usually need this text only for a short while and further conversion is not a question.
I would suggest deriving from std::exception and extend it to use your unicode string class.
Deriving from std::exception gives you the benefit of doing a:
catch (std::exception&)...
as your last catch and have it catch any exception you might have thrown (and STL). Where as if you create your own base exception (and have your other exception derive from that) you would need to add another catch.
Either way I don't think it really matters but I prefer this style (obviously this wastes an empty std::string from std::exception but I don't think it'll make a big difference).