I have multiple zip files containing different types of txt files. Like below:
zip1
- file1.txt
- file2.txt
- file3.txt
How can I use pandas to read in each of those files without extracting them?
I know if they were 1 file per zip I could use the compression method with read_csv like below:
df = pd.read_csv(textfile.zip, compression='zip')
Any help on how to do this would be great.
You can pass
ZipFile.open()
topandas.read_csv()
to construct apandas.DataFrame
from a csv-file packed into a multi-filezip
.Code:
Example to read all
.csv
into a dict:I had a similar problem with XML files awhile ago. The zipfile module can get you there.
If you want to concatenate them into a pandas object then it might get a bit more complex, but that should get you started. Note that the
read
method returns bytes, so you may have to handle that as well.