Complete a multipart_upload with boto3?

2020-02-28 23:16发布

问题:

Tried this:

import boto3
from boto3.s3.transfer import TransferConfig, S3Transfer
path = "/temp/"
fileName = "bigFile.gz" # this happens to be a 5.9 Gig file
client = boto3.client('s3', region)
config = TransferConfig(
    multipart_threshold=4*1024, # number of bytes
    max_concurrency=10,
    num_download_attempts=10,
)
transfer = S3Transfer(client, config)
transfer.upload_file(path+fileName, 'bucket', 'key')

Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.

I found this example, but part is not defined.

import boto3

bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'

s3 = boto3.client('s3')

# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
    part1 = s3.upload_part(Bucket=bucket
                           , Key=key
                           , PartNumber=1
                           , UploadId=mpu['UploadId']
                           , Body=data)

# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
    'Parts': [
        {
            'PartNumber': 1,
            'ETag': part['ETag']
        }
    ]
}

# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
                             , Key=key
                             , UploadId=mpu['UploadId']
                             , MultipartUpload=part_info)

Question: Does anyone know how to use the multipart upload with boto3?

回答1:

I would advise you to use boto3.s3.transfer for this purpose. Here is an example:

import boto3


def upload_file(filename):
    session = boto3.Session()
    s3_client = session.client("s3")

    try:
        print("Uploading file: {}".format(filename))

        tc = boto3.s3.transfer.TransferConfig()
        t = boto3.s3.transfer.S3Transfer(client=s3_client, config=tc)

        t.upload_file(filename, "my-bucket-name", "name-in-s3.dat")

    except Exception as e:
        print("Error uploading: {}".format(e))


回答2:

Why not use just the copy option in boto3?

s3.copy(CopySource={
        'Bucket': sourceBucket,
        'Key': sourceKey}, 
    Bucket=targetBucket,
    Key=targetKey,
    ExtraArgs={'ACL': 'bucket-owner-full-control'})

There are details on how to initialise s3 object and obviously further options for the call available here boto3 docs.



回答3:

Your code was already correct. Indeed, a minimal example of a multipart upload just looks like this:

import boto3
s3 = boto3.client('s3')
s3.upload_file('my_big_local_file.txt', 'some_bucket', 'some_key')

You don't need to explicitly ask for a multipart upload, or use any of the lower-level functions in boto3 that relate to multipart uploads. Just call upload_file, and boto3 will automatically use a multipart upload if your file size is above a certain threshold (which defaults to 8MB).

You seem to have been confused by the fact that the end result in S3 wasn't visible made up of multiple parts:

Result: 5.9 gig file on s3. Doesn't seem to contain multiple parts.

... but this is the expected outcome. The whole point of the multipart upload API is to let you upload a single file over multiple HTTP requests and end up with a single object in S3.



回答4:

In your code snippet, clearly should be part -> part1 in the dictionary. Typically, you would have several parts (otherwise why use multi-part upload), and the 'Parts' list would contain an element for each part.

You may also be interested in the new pythonic interface to dealing with S3: http://s3fs.readthedocs.org/en/latest/



回答5:

Change Part to Part1

import boto3

bucket = 'bucket'
path = "/temp/"
fileName = "bigFile.gz"
key = 'key'

s3 = boto3.client('s3')

# Initiate the multipart upload and send the part(s)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
with open(path+fileName,'rb') as data:
    part1 = s3.upload_part(Bucket=bucket
                       , Key=key
                       , PartNumber=1
                       , UploadId=mpu['UploadId']
                       , Body=data)

# Next, we need to gather information about each part to complete
# the upload. Needed are the part number and ETag.
part_info = {
  'Parts': [
    {
        'PartNumber': 1,
        'ETag': part1['ETag']
    }
   ]
  }

# Now the upload works!
s3.complete_multipart_upload(Bucket=bucket
                         , Key=key
                         , UploadId=mpu['UploadId']
                         , MultipartUpload=part_info)