I have a file that I want to read that is itself zipped within a zip archive. For example, parent.zip contains child.zip, which contains child.txt. I am having trouble reading child.zip. Can anyone correct my code?
I assume that I need to create child.zip as a file-like object and then open it with a second instance of zipfile, but being new to python my zipfile.ZipFile(zfile.open(name)) is silly. It raises a zipfile.BadZipfile: "File is not a zip file" on (independently validated) child.zip
import zipfile
with zipfile.ZipFile("parent.zip", "r") as zfile:
for name in zfile.namelist():
if re.search(r'\.zip$', name) is not None:
# We have a zip within a zip
with **zipfile.ZipFile(zfile.open(name))** as zfile2:
for name2 in zfile2.namelist():
# Now we can extract
logging.info( "Found internal internal file: " + name2)
print "Processing code goes here"
To get this to work with python33 (under windows but that might be unrelevant) i had to do :
cStringIO does not exist so i used io.BytesIO
Here's a function I came up with. (Copied from here.)
Here's how I tested it:
When you use the
.open()
call on aZipFile
instance you indeed get an open file handle. However, to read a zip file, theZipFile
class needs a little more. It needs to be able to seek on that file, and the object returned by.open()
is not seekable in your case. Only Python 3 (3.2 and up) produces aZipExFile
object that supports seeking (provided the underlying file handle for the outer zip file is seekable, and nothing is trying to write to theZipFile
object).The workaround is to read the whole zip entry into memory using
.read()
, store it in aBytesIO
object (an in-memory file that is seekable) and feed that toZipFile
:or, in the context of your example: