What's the Pythonic way to report nonfatal err

2019-02-13 07:49发布

问题:

A parser I created reads recorded chess games from a file. The API is used like this:

import chess.pgn

pgn_file = open("games.pgn")

first_game = chess.pgn.read_game(pgn_file)
second_game = chess.pgn.read_game(pgn_file)
# ...

Sometimes illegal moves (or other problems) are encountered. What is a good Pythonic way to handle them?

  • Raising exceptions as soon as the error is encountered. However, this makes every problem fatal, in that execution stops. Often, there is still useful data that has been parsed and could be returned. Also, you can not simply continue parsing the next data set, because we are still in the middle of some half-read data.

  • Accumulating exceptions and raising them at the end of the game. This makes the error fatal again, but at least you can catch it and continue parsing the next game.

  • Introduce an optional argument like this:

    game = chess.pgn.read_game(pgn_file, parser_info)
    if parser_info.error:
       # This appears to be quite verbose.
       # Now you can at least make the best of the sucessfully parsed parts.
       # ...
    

Are some of these or other methods used in the wild?

回答1:

Actually, those are fatal errors -- at least, as far as being able to reproduce a correct game; on the other hand, maybe the player actually did make the illegal move and nobody noticed at the time (which would make it a warning, not a fatal error).

Given the possibility of both fatal errors (file is corrupted) and warnings (an illegal move was made, but subsequent moves show consistency with that move (in other words, user error and nobody caught it at the time)) I recommend a combination of the first and second options:

  • raise an exception when continued parsing isn't an option
  • collect any errors/warnings that don't preclude further parsing until the end

If you don't encounter a fatal error then you can return the game, plus any warnings/non-fatal errors, at the end:

return game, warnings, errors

But what if you do hit a fatal error?

No problem: create a custom exception to which you can attach the usable portion of the game and any other warnings/non-fatal errors to:

raise ParsingError(
    'error explanation here',
    game=game,
    warnings=warnings,
    errors=errors,
    )

then when you catch the error you can access the recoverable portion of the game, along with the warnings and errors.

The custom error might be:

class ParsingError(Exception):
    def __init__(self, msg, game, warnings, errors):
        super().__init__(msg)
        self.game = game
        self.warnings = warnings
        self.errors = errors

and in use:

try:
    first_game, warnings, errors = chess.pgn.read_game(pgn_file)
except chess.pgn.ParsingError as err:
    first_game = err.game
    warnings = err.warnings
    errors = err.errors
    # whatever else you want to do to handle the exception

This is similar to how the subprocess module handles errors.

For the ability to retrieve and parse subsequent games after a game fatal error I would suggest a change in your API:

  • have a game iterator that simply returns the raw data for each game (it only has to know how to tell when one game ends and the next begins)
  • have the parser take that raw game data and parse it (so it's no longer in charge of where in the file you happen to be)

This way if you have a five-game file and game two dies, you can still attempt to parse games 3, 4, and 5.



回答2:

The most Pythonic way is the logging module. It has been mentioned in comments but unfortunately without stressing this hard enough. There are many reasons it's preferable to warnings:

  1. Warnings module is intended to report warnings about potential code issues, not bad user data.
  2. First reason is actually enough. :-)
  3. Logging module provides adjustable message severity: not only warnings, but anything from debug messages to critical errors can be reported.
  4. You can fully control output of logging module. Messages can be filtered by their source, contents and severity, formatted in any way you wish, sent to different output targets (console, pipes, files, memory etc)...
  5. Logging module separates actual error/warning/message reporting and output: your code can generate messages of appropriate type and doesn't have to bother how they're presented to end user.
  6. Logging module is the de-facto standard for Python code. Everyone everywhere is using it. So if your code is using it, combining it with 3rd party code (which is likely using logging too) will be a breeze. Well, maybe something stronger than breeze, but definitely not a category 5 hurricane. :-)

A basic use case for logging module would look like:

import logging
logger = logging.getLogger(__name__) # module-level logger

# (tons of code)
logger.warning('illegal move: %s in file %s', move, file_name)
# (more tons of code)

This will print messages like:

WARNING:chess_parser:illegal move: a2-b7 in file parties.pgn

(assuming your module is named chess_parser.py)

The most important thing is that you don't need to do anything else in your parser module. You declare that you're using logging system, you're using a logger with a specific name (same as your parser module name in this example) and you're sending warning-level messages to it. Your module doesn't have to know how these messages are processed, formatted and reported to user. Or if they're reported at all. For example, you can configure logging module (usually at the very start of your program) to use a different format and dump it to file:

logging.basicConfig(filename = 'parser.log', format = '%(name)s [%(levelname)s] %(message)s')

And suddenly, without any changes to your module code, your warning messages are saved to a file with a different format instead of being printed to screen:

chess_parser [WARNING] illegal move: a2-b7 in file parties.pgn

Or you can suppress warnings if you wish:

logging.basicConfig(level = logging.ERROR)

And your module's warnings will be ignored completely, while any ERROR or higher-level messages from your module will still be processed.



回答3:

I offered the bounty because I'd like to know if this is really the best way to do it. However, I'm also writing a parser and so I need this functionality, and this is what I've come up with.


The warnings module is exactly what you want.

What turned me away from it at first was that every example warning used in the docs looks like these:

Traceback (most recent call last):
  File "warnings_warn_raise.py", line 15, in <module>
    warnings.warn('This is a warning message')
UserWarning: This is a warning message

...which is undesirable because I don't want it to be a UserWarning, I want my own custom warning name.

Here's the solution to that:

import warnings
class AmbiguousStatementWarning(Warning):
    pass

def x():
    warnings.warn("unable to parse statement syntax",
                  AmbiguousStatementWarning, stacklevel=3)
    print("after warning")

def x_caller():
    x()

x_caller()

which gives:

$ python3 warntest.py 
warntest.py:12: AmbiguousStatementWarning: unable to parse statement syntax
  x_caller()
after warning


回答4:

I'm not sure if the solution is pythonic or not, but I use it rather often with slight modifications: a parser does its job within a generator and yields results and a status code. The receiving code makes decisions what to to with failed items:

def process_items(items)
    for item in items:
        try:
            #process item
            yield processed_item, None
        except StandardError, err:
            yield None, (SOME_ERROR_CODE, str(err), item)


for processed, err in process_items(items):
    if err:
       # process and log err, collect failed items, etc.
       continue
    # further process processed

A more general approach is to practice in using design patterns. A simplified version of Observer (when you register callbacks for specific errors) or a kind of Visitor (where the visitor has methods for procesing specific errors, see SAX parser for insights) might be a clear and well understood solution.



回答5:

Without libraries, it is difficult to do this cleanly, but still possible.

There are different methods of handling this, depending on the situation.

Method 1:

Put all contents of while loop inside the following:

while 1:
    try:
        #codecodecode
    except Exception as detail:
        print detail

Method 2:

Same as Method 1, except having multiple try/except thingies, so it doesn't skip too much code & you know the exact location of the error.

Sorry, in a rush, hope this helps!