Using csvreader against a gzipped file in Python

2019-03-12 02:20发布

问题:

I have a bunch of gzipped CSV files that I'd like to open for inspection using Python's built in CSV reader. I'd like to do this without having first to manually unzip them to disk. I guess I want to somehow get a stream to the uncompressed data, and pass this into the CSV reader. Is this possible in Python?

回答1:

Use the gzip module:

with gzip.open(filename) as f:
    reader = csv.reader(f)
    #...


回答2:

I've tried the above version for writing and reading and it didn't work in Python 3.3 due to "bytes" error. However, after some trial and error I could get the following to work. Maybe it also helps others:

import csv
import gzip
import io


with gzip.open("test.gz", "w") as file:
    writer = csv.writer(io.TextIOWrapper(file, newline="", write_through=True))
    writer.writerow([1, 2, 3])
    writer.writerow([4, 5, 6])

with gzip.open("test.gz", "r") as file:
    reader = csv.reader(io.TextIOWrapper(file, newline=""))
    print(list(reader))

As amohr suggests, the following works as well:

import gzip, csv

with gzip.open("test.gz", "wt", newline="") as file:
    writer = csv.writer(file)
    writer.writerow([1, 2, 3])
    writer.writerow([4, 5, 6])

with gzip.open("test.gz", "rt", newline="") as file:
    reader = csv.reader(file)
    print(list(reader))


回答3:

a more complete solution:

import csv, gzip
class GZipCSVReader:
    def __init__(self, filename):
        self.gzfile = gzip.open(filename)
        self.reader = csv.DictReader(self.gzfile)
    def next(self):
        return self.reader.next()
    def close(self):
        self.gzfile.close()
    def __iter__(self):
        return self.reader.__iter__()

now you can use it like this:

r = GZipCSVReader('my.csv')
for map in r:
    for k,v in map:
        print k,v
r.close()

EDIT: following the below comment, how about a simpler approach:

def gzipped_csv(filename):
    with gzip.open(filename) as f:
        r = csv.DictReader(f)
        for row in r:
            yield row

which let's you then

for row in gzipped_csv(filename):
    for k, v in row:
        print(k, v)


标签: python csv gzip