I have a bunch of gzipped CSV files that I'd like to open for inspection using Python's built in CSV reader. I'd like to do this without having first to manually unzip them to disk. I guess I want to somehow get a stream to the uncompressed data, and pass this into the CSV reader. Is this possible in Python?
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
Use the gzip
module:
with gzip.open(filename) as f:
reader = csv.reader(f)
#...
回答2:
I've tried the above version for writing and reading and it didn't work in Python 3.3 due to "bytes" error. However, after some trial and error I could get the following to work. Maybe it also helps others:
import csv
import gzip
import io
with gzip.open("test.gz", "w") as file:
writer = csv.writer(io.TextIOWrapper(file, newline="", write_through=True))
writer.writerow([1, 2, 3])
writer.writerow([4, 5, 6])
with gzip.open("test.gz", "r") as file:
reader = csv.reader(io.TextIOWrapper(file, newline=""))
print(list(reader))
As amohr suggests, the following works as well:
import gzip, csv
with gzip.open("test.gz", "wt", newline="") as file:
writer = csv.writer(file)
writer.writerow([1, 2, 3])
writer.writerow([4, 5, 6])
with gzip.open("test.gz", "rt", newline="") as file:
reader = csv.reader(file)
print(list(reader))
回答3:
a more complete solution:
import csv, gzip
class GZipCSVReader:
def __init__(self, filename):
self.gzfile = gzip.open(filename)
self.reader = csv.DictReader(self.gzfile)
def next(self):
return self.reader.next()
def close(self):
self.gzfile.close()
def __iter__(self):
return self.reader.__iter__()
now you can use it like this:
r = GZipCSVReader('my.csv')
for map in r:
for k,v in map:
print k,v
r.close()
EDIT: following the below comment, how about a simpler approach:
def gzipped_csv(filename):
with gzip.open(filename) as f:
r = csv.DictReader(f)
for row in r:
yield row
which let's you then
for row in gzipped_csv(filename):
for k, v in row:
print(k, v)