Amazon S3 - different lifecycle rule for “subdirec

2020-08-09 09:44发布

问题:

Let's say I have the following data structure:

  • /
  • /foo
  • /foo/bar
  • /foo/baz

Is it possible to assign to it the following life-cycle rules:

  • / (1 month)
  • /foo (2 months)
  • /foo/bar (3 months)
  • /foo/baz (6 months)

The official documentation is unfortunately self-contradictionary in this regard. It doesn't seem to work with AWS console, which makes me somewhat doubtful that SDKs/REST would be any different ;)

Failing at that my root problem is: I have 4 types of projects. The most rudimentary type has a few thousand projects, the other ones have a few dozen. Each type I am obligated to store for a different period of time. Each project contains hundreds of thousands of objects. It looks more or less as:

  • type A, 90% of projects, x storage required
  • type B, 6% of projects, 2x storage required
  • type C, 3% of projects, 4x storage required
  • type D, 1% of projects, 8x storage required

So far so simple. However. Projects may be upgraded or downgraded from one type to another. And as I said - I have a few thousand instances of the first type so I can't write specific rules for every one of them (remember 1000 rule limit per bucket). And since they may upgrade from one type to another I can't simply insert them in a their own folders as well (ex. only projects from a particular type) or bucket. Or so I think? Are there any other options open to me other than iterating over every object, every time I want to purge expired files - which I would seriously rather not do because of the sheer number of objects?

Maybe some kind of file "move/transfer" between buckets that doesn't modify the creation time metadata, and isn't costly for our server to process?

Would be much obliged :)

回答1:

Lifecycle policies are based on prefix, not "subdirectory."

So if objects matching the foo/ prefix are to be deleted in 2 months, it is not logical to ask for objects with a prefix of foo/bar/ to be deleted in 3 months, because they're going to be deleted after 2 months... since they also match the prefix foo/. Prefix means prefix. Delimiters are not a factor in lifecycle rules.

Also note that keys and prefixes in S3 do not begin with /. A policy affecting the entire bucket uses the empty string as a prefix, not /.

You do, also, probably want to remember the trailing slashes when you specify prefixes, because foo/bar matches the file foo/bart.jpg while foo/bar/ does not.

Iterating over objects for deletion is not as bad as you make it out to be, since the List Objects API call returns 1000 objects per request (or fewer, if you want), and allows you to specify both prefix and delimiter (usually, you'll use / as the delimiter if you want the responses grouped using the pseudo-folder model the console uses to create the hierarchical display)... and each object's key and datestamp is provided in the response XML. There's also an API request to delete multiple objects in one call.

Any kind of move, transfer, copy, etc. will always reset the creation date of the object. Even modifying the metadata, because objects are immutable. Any time you move, transfer, copy, or "rename" an object (which is actually copy and delete), or modify metadata (which is actually copy to the same key, with different metadata) you are actually creating a new object.



回答2:

@Zardii you can use unique s3 object tags [1] for the objects under these prefixes

Then you can apply the life cycle policy by tag with varying retention/deletion period.

[1] https://docs.aws.amazon.com/AmazonS3/latest/dev/object-tagging.html

Prefix - S3 Tags

/ tag=> delete_after_one_month

/foo tag=> delete_after_two_months

/foo/bar tag=> delete_after_three_months

/foo/baz tag=> delete_after_six_month