How to delete multiple files in S3 bucket with AWS

2020-05-22 04:21发布

问题:

Suppose I have an S3 bucket named x.y.z

In this bucket, I have hundreds of files. But I only want to delete 2 files named purple.gif and worksheet.xlsx

Can I do this from the AWS command line tool with a single call to rm?

This did not work:

$ aws s3 rm s3://x.y.z/worksheet.xlsx s3://x.y.z/purple.gif
Unknown options: s3://x.y.z/purple.gif

From the manual, it doesn't seem like you can delete a list of files explicitly by name. Does anyone know a way to do it? I prefer not using the --recursive flag.

回答1:

s3 rm cannot delete multiple files, but you can use s3api delete-objects to achieve what you want here.

Example

aws s3api delete-objects --bucket x.y.z --delete '{"Objects":[{"Key":"worksheet.xlsx"},{"Key":"purple.gif"}]}'


回答2:

You can do this by providing an --exclude or --include argument multiple times. But, you'll have to use --recursive for this to work.

When there are multiple filters, remember that the order of the filter parameters is important. The rule is the filters that appear later in the command take precedence over filters that appear earlier in the command.

aws s3 rm s3://x.y.z/ --recursive --exclude "*" --include "purple.gif" --include "worksheet.xlsx"

Here, all files will be excluded from the command except for purple.gif and worksheet.xlsx.

If you're unsure, always try a --dryrun first and inspect which files will be deleted.

Source: Use of Exclude and Include Filters



回答3:

If you are using AWS CLI you can filter LS results with grep regex and delete them. For example

aws s3 ls s3://BUCKET | awk '{print $4}' | grep -E -i '^2015-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9][0-9])\-([0-9a-zA-Z]*)' | xargs -I% bash -c 'aws s3 rm s3://BUCKET/%'

This is slow but it works



回答4:

I found this one useful through the command line. I had more than 4 million files and it took almost a week to empty the bucket. This comes handy as the AWS console is not descriptive with the logs.

Note: You need the jq tool installed.

 aws s3api list-object-versions --bucket YOURBUCKETNAMEHERE-processed \
     --output json --query 'Versions[].[Key, VersionId]' \
     | jq -r '.[] | "--key '\''" + .[0] + "'\'' --version-id " + .[1]' \
     | xargs -L1 aws s3api delete-object --bucket YOURBUCKETNAMEHERE


回答5:

Apparently aws s3 rm works only on individual files/objects.

Below is a bash command that works with some success (a bit slow, but works):

aws s3 ls s3://bucketname/foldername/ | 
awk {'print "aws s3 rm s3://bucketname/foldername/" $4'} | 
bash

The first two lines are meant to construct the "rm" commands and the 3rd line (bash) will execute them.

Note that you might face issues if your object names have spaces or funny characters. This is because "aws s3 ls" command won't list such objects.



回答6:

Notice that:

aws s3 rm s3://x.y.z/ --recursive --include "\*.gif" removes all files in on the path, including "\*.gif"

aws s3 rm s3://x.y.z/ --recursive --exclude "\*" --include "\*.gif" removes only files that matches "\*.gif"



回答7:

This solution will work when you want to specify wildcard for object name.

aws s3 ls dmap-live-dwh-files/backup/mongodb/oms_api/hourly/ | grep order_2019_08_09_* | awk {'print "aws s3 rm s3://dmap-live-dwh-files/backup/mongodb/oms_api/hourly/" $4'} | bash