How to find duplicate files in an AWS S3 bucket?

2019-02-19 17:30发布

Is there a way to recursively find duplicate files in an Amazon S3 bucket? In a normal file system, I would simply use:

fdupes -r /my/directory

标签： linux amazon-web-services amazon-s3 amazon-ec2 duplicates

3条回答

The star\"

2楼-- · 2019-02-19 18:04

Here's a git repository: https://github.com/chilts/node-awssum-scripts which has a js script file to find out the duplicates in a S3 bucket. I know, pointing you to an external source is not recommended, but I hope it may help you.

0人赞添加讨论(0) 举报

孤傲高冷的网名

3楼-- · 2019-02-19 18:16

There is no "find duplicates" command in Amazon S3.

However, you do do the following:

Retrieve a list of objects in the bucket
Look for objects that have the same ETag (checksum) and Size

They would (extremely likely) be duplicate objects.

0人赞添加讨论(0) 举报

The star\"

4楼-- · 2019-02-19 18:23

import boto3
s3client = boto3.client('s3',aws_access_key_id=ACCESS_KEY,aws_secret_access_key=SECRET_KEY,region_name=region)
etag = s3client.head_object(Bucket='myBucket',Key='index.html')['ResponseMetadata']['HTTPHeaders']['etag']
print(etag)

0人赞添加讨论(0) 举报

How to find duplicate files in an AWS S3 bucket?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间