How to get the file diff between two S3 buckets?

2019-05-07 08:57发布

问题:

So I have an S3 bucket of videos (several hundred), upon which I used ElasticTranscoder to transcode everything into a second, optimised bucket.

However, when I inspect my second bucket, there are 40-50 less objects, but I cannot figure out what they are (the directory structure is deeply nested etc).

How can I get the file diff of two buckets using aws s3api list-objects?

Perhaps there are files in the bucket which are not videos, which I somehow didn't know about.

回答1:

Using Display only filenames:

aws s3 ls s3://bucket-1 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_1_files
aws s3 ls s3://bucket-2 --recursive | awk '{$1=$2=$3=""; print $0}' | sed 's/^[ \t]*//' | sort > bucket_2_files

diff bucket_1_files bucket_2_files


回答2:

You can use the sync command with the --dryrun option to compare instead of syncing.

aws s3 sync s3://bucket s3://bucket2 --dryrun

You can, of course, also use it to compare a local directory with a bucket.

aws s3 sync . s3://bucket2 --dryrun