So far the files are just being downloaded individually like the following rather than all being in one zipped file:
s3client = boto3.client('s3')
t.download_file(‘firstbucket’, obj['Key'], filename)
So far the files are just being downloaded individually like the following rather than all being in one zipped file:
s3client = boto3.client('s3')
t.download_file(‘firstbucket’, obj['Key'], filename)
Let me save you some trouble by using AWS CLI:
aws s3 cp s3://mybucket/mydir/ . --recursive ; zip myzip.zip *.csv
You can change the wildcard to suit your needs but this will work inherently faster than Python seeing as AWS CLI has been optimized far beyond the capabilities of boto
if you want to use boto you'll have to do it in a loop like you have and add each item to a zip file.
with the CLI you can use s3 sync and then zip that up https://docs.aws.amazon.com/cli/latest/reference/s3/sync.html
aws s3 sync s3://bucket-name ./local-location && zip bucket.zip ./local-location
It looks like you're really close, but you need to pass a file name to ZipFile.write()
and download_file
does not return a file name. The following should work alright, but I haven't tested it exhaustively.
from tempfile import NamedTemporaryFile
from zipfile import ZipFile
import boto3
def archive_bucket(bucket_name, zip_name):
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
with ZipFile(zip_name, 'w') as zf:
for page in paginator.paginate(Bucket=bucket_name):
for obj in page['Contents']:
# This might have issues on some systems since the file will
# be open for writes in two places. You can use other
# methods of creating a temporary file to work around that.
with NamedTemporaryFile() as f:
s3.download_file(bucket_name, obj['Key'], f.name)
# Copies over the temprary file using the key as the
# file name in the zip.
zf.write(f.name, obj['Key'])
This has less space usage than the solutions using the CLI, but it still isn't ideal. You will still have two copies of a given file at some point in time: one in the temp file and one that has been zipped up. So you need to make sure that you have enough space on disk to support the size of all the files you're downloading plus the size of the largest of those files. If there were a way to open a file-like object that wrote directly to a file in the zip directory then you could get around that. I don't know how to do that however.