How to move object from Amazon S3 to Glacier with

2019-08-01 02:23发布

问题:

I'm looking for a solution for moving Amazon S3 objects to Glacier with Vault Lock enabled (like described here https://aws.amazon.com/blogs/aws/glacier-vault-lock/). I'd like to use Amazon built in tools for that (lifecycle management or some other) if possible.

I cannot find any instructions or options to do that. S3 seems to only allow moving object to Glacier storage class. But that does not provide data integrity nor defends against data loss.

I know I could do it with a program. It would download S3 object and move it to Glacier through their respective REST API's. This approach seems too complicated for this simple task.

回答1:

Picture it this way:

  • Glacier is a service of AWS.

  • S3 is a service of AWS.

  • But S3 is also a customer of the Glacier service.

When you migrate an object in S3 to the Glacier storage class, S3 stores the object in Glacier... using an AWS account that is owned by S3.

Those objects in S3 that use the GLACIER storage class aren't in "your" Glacier vaults, they're in vaults owned by S3.

This is consistent with the externally-observable evidence:

  • You can't see these S3 objects in vaults from the Glacier console.

  • You don't have to give S3 any IAM permissions to access Glacier (by contrast, you do have to give S3 permission to publish event notifications to SQS, SNS, or Lambda)

  • Glacier doesn't bill you for Glacier storage class objects -- S3 does.

In that light, what you are trying to accomplish is completely different. You want to store some archives in your Glacier vault, with your policy, and that content currently just "happens to be" stored in S3 at the moment.

Downloading from S3 and then uploading to Glacier is the solution.

But that does not provide data integrity nor defends against data loss.

The integrity of the payload can be assured when uploading to Glacier because the tree hash algorithm effectively prevents corrupt uploads.

Downloading from S3, unless the object is stored with SSE-C, the ETag is the MD5 hash of the stored object if single-part upload is used, or is the hex-encoded MD5 hash of the concatenated binary-encoded MD5 hashes of the parts, followed by - and the number of parts. Ideally, when uploading to S3, you'd store a better hash (e.g. sha256) in the object metadata, e.g. x-amz-meta-content-sha256.

Defense against data loss -- yes, Glacier does offer more functionality, here, but S3 is not entirely without calability here: bucket policies with a matching DENY action will alwsys override any conflicting ALLOW action, whether it is in the bucket policy or any other IAM policy (e.g. role, user).