We want to delete objects from S3, 10 minutes after they are created. Is it possible currently?
问题:
回答1:
I have a working solution that was built serverless with the help of AWS's Simple Queue Service and AWS Lambda. This works for all objects created in an s3 bucket.
Overview
When any object is created in your s3 bucket, the bucket will send an event with object details to an SQS queue configured with a 10 minute delivery delay. The SQS queue is also configured to trigger a Lambda function. The Lambda function reads the object details from the event sent and deletes the object from the s3 bucket. All three components involved (s3, SQS and Lambda) are low cost, loosely coupled, serverless and scale automatically to very large workloads.
Steps Involved
Setup your Lambda Function First. In my solution, I used Python 3.7. The code for the function is:
import json import boto3 def lambda_handler(event, context): for record in event['Records']: v = json.loads(record['body']) for rec in v["Records"]: bucketName = rec["s3"]["bucket"]["name"] objectKey = rec["s3"]["object"]["key"] #print("bucket is " + bucketName + " and object is " + objectKey ) sss = boto3.resource("s3") obj = sss.Object(bucketName, objectKey) obj.delete() return { 'statusCode': 200, 'body': json.dumps('Delete Completed.') }
This code and a sample message file were uploaded to a github repo.
- Create a vanilla SQS queue. Then configure the SQS queue to have a 10 minute delivery Delay. This setting can be found under Queue Actions -> Configure Queue -> 4 setting down
Configure the SQS Queue to trigger the Lambda Function you created in Step 1. To do this use Queue Actions -> Configure Trigger for Lambda Function. The setup screen is self explanatory. If you don't see your Lambda function from step 1, redo it correctly and make sure you are using the same Region.
Setup your S3 Bucket so that it fires an event to the SQS Queue you created in step 2. This is found on the main bucket screen, click Properties tab and select Events. Click the plus sign to add an event and fill out the following form:
Important points to select are to select All Object create events
and to select the queue you created in Step 2 for the last pull down on this screen.
- Last step - Add an execute policy to your Lambda Function that allows it to only delete from the specific S3 bucket. You can do this via the Lambda function console. Scroll down the Lambda function screen of your console and configure it under
Execution Role
.
This works for files I've copied into a single s3 bucket. This solution could support many S3 buckets to 1 queue and 1 lambda.
回答2:
In addition to the detailed solution proposed by @taterhead involving a SQS queue, one might also consider the following serverless solution using AWS Step Functions:
- Create a State Machine in AWS Step Functions with a Wait state of 10 minutes followed by a Task state executing a Lambda function that will delete the object.
- Configure CloudTrail and CloudWatch Events to start an execution of your state machine when an object is uploaded to S3.
It has the advantage of (1) not having the 15 minutes limit and (2) avoiding the continuous queue polling cost generated by the Lambda function.
Inspiration: Schedule emails without polling a database using Step Functions
回答3:
If anyone is still interest in this, S3 now offers Life Cycle rules which I've just been looking into, and they seem simple enough to configure in the AWS S3 Console.
The "Management" tab of an S3 bucket will reveal a button labeled "Add lifecycle rule" where users can select specific prefixes for objects and also set expiration times for the life times of the objects in the bucket that's being modified.
For a more detailed explanation, AWS have published an article on the matter, which explains this in more detail here.