If I'm uploading data to S3 using the aws-cli (i.e. using aws s3 cp
), does aws-cli do any work to confirm that the resulting file in S3 matches the original file, or do I somehow need to manage that myself?
Based on this answer and the Java API documentation for putObject(), it looks like it's possible to verify the MD5 checksum after upload. However, I can't find a definitive answer on whether aws-cli actually does that.
It matters to me because I'm intending to upload GPG-encrypted files from a backup process, and I'd like some confidence that what's been stored in S3 actually matches the original.
According to the faq from the aws-cli github, the checksums are checked in most cases during upload and download.
Key points for uploads:
The AWS support page How do I ensure data integrity of objects uploaded to or downloaded from Amazon S3? describes how to achieve this.
Firstly determine the base64 encoded md5sum of the file you wish to upload:
Then use the s3api to upload the file:
Note the use of the
--content-md5
flag, the help for this flag states:This does not say much about why to use this flag, but we can find this information in the API documentation for put object:
Using this flag causes S3 to verify that the file hash serverside matches the specified value. If the hashes match s3 will return the ETag:
The ETag value will usually be the hexadecimal md5sum (see this question for some scenarios where this may not be the case).
If the hash does not match the one you specified you get an error.
In addition to this you can also add the file md5sum to the file metadata as an additional check:
After upload you can issue the
head-object
command to check the values.Here is a bash script that uses content md5 and adds metadata and then verifies that the values returned by S3 match the local hashes: