-->

What is the pythonic way to read CSV file data as

2019-01-22 18:30发布

问题:

What is the best way to take a data file that contains a header row and read this row into a named tuple so that the data rows can be accessed by header name?

I was attempting something like this:

import csv
from collections import namedtuple

with open('data_file.txt', mode="r") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", ", ".join(i for i in reader[0]))
    next(reader)
    for row in reader:
        data = Data(*row)

The reader object is not subscriptable, so the above code throws a TypeError. What is the pythonic way to reader a file header into a namedtuple?

回答1:

Use:

Data = namedtuple("Data", next(reader))

and omit the line:

next(reader)

Combining this with an iterative version based on martineau's comment below, the example becomes for Python 2

import csv
from collections import namedtuple
from itertools import imap

with open("data_file.txt", mode="rb") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in imap(Data._make, reader):
        print data.foo
        # ...further processing of a line...

and for Python 3

import csv
from collections import namedtuple

with open("data_file.txt", newline="") as infile:
    reader = csv.reader(infile)
    Data = namedtuple("Data", next(reader))  # get names from column headers
    for data in map(Data._make, reader):
        print(data.foo)
        # ...further processing of a line...


回答2:

Please have a look at csv.DictReader. Basically, it provides the ability to get the column names from the first row as you're looking for and, after that, lets you access to each column in a row by name using a dictionary.

If for some reason you still need to access the rows as a collections.namedtuple, it should be easy to transform the dictionaries to named tuples as follows:

with open('data_file.txt') as infile:
    reader = csv.DictReader(infile)
    Data = collections.namedtuple('Data', reader.fieldnames)
    tuples = [Data(**row) for row in reader]