How to find duplicate files in an AWS S3 bucket?

2019-02-19 17:30发布

Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:

fdupes -r /my/directory

3条回答
The star\"
2楼-- · 2019-02-19 18:04

Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.

查看更多
孤傲高冷的网名
3楼-- · 2019-02-19 18:16

There is no "find duplicates" command in Amazon S3.

However, you do do the following:

  • Retrieve a list of objects in the bucket
  • Look for objects that have the same ETag (checksum) and Size

They would (extremely likely) be duplicate objects.

查看更多
The star\"
4楼-- · 2019-02-19 18:23
import boto3
s3client = boto3.client('s3',aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY,region_name=region)
etag = s3client.head_object(Bucket='myBucket',Key='index.html')['ResponseMetadata']['HTTPHeaders']['etag']
print(etag)
查看更多
登录 后发表回答