Case: There is a large zip file in an S3 bucket which contains a large number of images. Is there a way without downloading the whole file to read the metadata or something to know how many files are inside the zip file?
When the file is local, in python i can just open it as a zipfile() and then I call the namelist() method which returns a list of all the files inside, and I can count that. However not sure how to do this when the file resides in S3 without having to download it. Also if this is possible with Lambda would be best.
As of now, you cannot get such information without downloading the zip file. You can store the required information as the metadata for a zip file when uploading to s3.
As you have mentioned in your question, using the python functions we are able to get the file list without extracting. You can use the same approach to get the file counts and add as metadata to a particular file and then upload it to S3.
Hope this helps, Thanks
You can try to download a part of archive (first 1Mb at example) and use
jar
tool to see filelist and attributes:And you can use subprocess module to obtain this data in python.
I think this will solve your problem: