How do I detect missing fields in a CSV file in a

2020-02-10 04:57发布

问题:

I'm trying to parse a CSV file using Python's csv module (specifically, the DictReader class). Is there a Pythonic way to detect empty or missing fields and throw an error?

Here's a sample file using the following headers: NAME, LABEL, VALUE

foo,bar,baz
yes,no
x,y,z

When parsing, I'd like the second line to throw an error since it's missing the VALUE field.

Here's a code snippet which shows how I'm approaching this (disregard the hard-coded strings...they're only present for brevity):

import csv

HEADERS = ["name", "label", "value" ]
fileH = open('configFile')
reader = csv.DictReader(fileH, HEADERS)

for row in reader:
    if row["name"] is None or row["name"] == "":
        # raise Error
    if row["label"] is None or row["label"] == "":
        # raise Error
    ...
fileH.close()

Is there a cleaner way of checking for fields in the CSV file w/out having a bunch of if statements? If I need to add more fields, I'll also need more conditionals, which I would like to avoid if possible.

回答1:

if any(row[key] in (None, "") for key in row):
    # raise error

Edit: Even better:

if any(val in (None, "") for val in row.itervalues()):
    # raise error


回答2:

Since None and empty strings both evaluate to False, you should consider this:

for row in reader:
    for header in HEADERS:
        if not row[header]:
            # raise error

Note that, unlike some other answers, you will still have the option of raising an informative, header-specific error.



回答3:

Something like this?

...
for row in reader:
    for column, value in row.items():
        if value is None or value == "":
            # raise Error, using value of column to say which field is missing

You may be able to use 'if not value:' as your test instead of the more explicit test you gave.



回答4:

This code will provide, for each row, a list of field names which are not present (or are empty) for that row. You could then provide a more detailed exception, such as "Missing fields: foo, baz".

def missing(row):
    return [h for h in HEADERS if not row.get(h)]

for row in reader:
    m = missing(row)
    if missing:
        # raise exception with list of missing field names


回答5:

If you use matplotlib.mlab.csv2rec, it already saves the content of the file into an array and raise an error if one of the values is missing.

>>> from matplotlib.mlab import csv2rec
>>> content_array = csv2rec('file.txt')
IndexError: list index out of range

The problem is that there is not a simple way to customize this behaviour, or to supply a default value in case of missing rows. Moreover, the error message is not very explainatory (could be useful to post a bug report here).

p.s. since csv2rec saves the content of the file into a numpy record, it will be easier to get the values equal to None.