I'm trying to parse a CSV file using Python's csv
module (specifically, the DictReader
class). Is there a Pythonic way to detect empty or missing fields and throw an error?
Here's a sample file using the following headers: NAME, LABEL, VALUE
foo,bar,baz
yes,no
x,y,z
When parsing, I'd like the second line to throw an error since it's missing the VALUE field.
Here's a code snippet which shows how I'm approaching this (disregard the hard-coded strings...they're only present for brevity):
import csv
HEADERS = ["name", "label", "value" ]
fileH = open('configFile')
reader = csv.DictReader(fileH, HEADERS)
for row in reader:
if row["name"] is None or row["name"] == "":
# raise Error
if row["label"] is None or row["label"] == "":
# raise Error
...
fileH.close()
Is there a cleaner way of checking for fields in the CSV file w/out having a bunch of if
statements? If I need to add more fields, I'll also need more conditionals, which I would like to avoid if possible.
if any(row[key] in (None, "") for key in row):
# raise error
Edit: Even better:
if any(val in (None, "") for val in row.itervalues()):
# raise error
Since None
and empty strings both evaluate to False
, you should consider this:
for row in reader:
for header in HEADERS:
if not row[header]:
# raise error
Note that, unlike some other answers, you will still have the option of raising an informative, header-specific error.
Something like this?
...
for row in reader:
for column, value in row.items():
if value is None or value == "":
# raise Error, using value of column to say which field is missing
You may be able to use 'if not value:' as your test instead of the more explicit test you gave.
This code will provide, for each row, a list of field names which are not present (or are empty) for that row. You could then provide a more detailed exception, such as "Missing fields: foo, baz".
def missing(row):
return [h for h in HEADERS if not row.get(h)]
for row in reader:
m = missing(row)
if missing:
# raise exception with list of missing field names
If you use matplotlib.mlab.csv2rec, it already saves the content of the file into an array and raise an error if one of the values is missing.
>>> from matplotlib.mlab import csv2rec
>>> content_array = csv2rec('file.txt')
IndexError: list index out of range
The problem is that there is not a simple way to customize this behaviour, or to supply a default value in case of missing rows. Moreover, the error message is not very explainatory (could be useful to post a bug report here).
p.s. since csv2rec saves the content of the file into a numpy record, it will be easier to get the values equal to None.