S3 boto list keys sometimes returns directory key

2020-02-23 08:04发布

问题:

I've noticed a difference between the returns from boto's api depending on the bucket location. I have the following code:

con = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = con.get_bucket(S3_BUCKET_NAME)
keys = bucket.list(path)
for key in keys:
  print key

which im running against two buckets, one in us-west and one in ireland. Path in this bucket is a sub-directory, against Ireland I get the sub directory and any keys underneath, against us-west I only get the keys beneath.

So Ireland gives:

<Key: <bucketName>,someDir/>
<Key: <bucketName>,someDir/someFile.jpg>
<Key: <bucketName>,someDir/someOtherFile.jpg>

where as US Standard gives:

<Key: <bucketName>,someDir/someFile.jpg>
<Key: <bucketName>,someDir/someOtherFile.jpg>

Obviously, I want to be able to write the same code regardless of bucket location. Anyone know of anything I can do to work around this so I get the same predictable results. Or even if it's boto causing the problem or S3. I noticed there is a different policy for naming buckets in Ireland, do different locals have their own version of the api's?

Thanks,

Steve

回答1:

Thanks to Steffen, who suggested looking at how the keys are created. With further investigation I think I've got a handle on whats happening here. My original suposition that it was linked to the bucket region was a red herring. It appears to be due to what the management console does when you manipulate keys.

If you create a directory in the management console it creates a 0 byte key. This will be returned when you perform a list.

If you use boto to create/upload a file then it doesn't create the folder. Interestingly, if you delete the file from within the folder (from the AWS console) then a key is created for the folder that used to contain the key. If you then upload the bey again using boto, then you have exactly the same looking structure from the UI, but infact you have a spurious additional key for the directory. This is what was happening to me, as I was testing our application I was clearing out keys and then finding different results.

Worth knowing this happens. There is no indicator in the UI to show if a folder is a created one (one that will be returned as a key) or an interpreted one (based on a keys name).



回答2:

I don't have a definite answer for your question, but can throw in some partial ones at least:

Background

Directory/Folder simulation

Amazon S3 doesn't actually have a native concept of folders/directories, rather is a flat storage architecture comprised of buckets and objects/keys only - the directory style presentation seen in most tools for S3 (including the AWS Management Console itself) is based solely on convention, i.e. simulating a hierarchy for objects with identical prefixes - see my answer to How to specify an object expiration prefix that doesn't match the directory? for more details on this architecture, including quotes/references from the AWS documentation.

API differences per region

I noticed there is a different policy for naming buckets in Ireland, do different locals have their own version of the api's?

That's apparently the case indeed for Amazon S3 specifically, which is one of their oldest offerings, see e.g. Bucket Restrictions and Limitations:

In all regions except for the US Standard region, You must use the following guidelines when naming a bucket. [...] [emphasis mine]

These specifics for the US Standard region are seen in other places of the S3 documentation as well, and US Standard is an unusual construct itself compared to the otherwise clearly geographically constrained Regions:

US Standard — Uses Amazon S3 servers in the United States

This is the default Region. The US Standard Region automatically routes requests to facilities in Northern Virginia or the Pacific Northwest using network maps. To use this region, select US Standard as the region when creating a bucket in the console. The US Standard Region provides eventual consistency for all requests. [emphasis mine]

This implicit CDN behavior is unique for this default Region of S3 (i.e. US Standard) and not seen elsewhere on any other AWS service I think.

Likely Cause

I have a faint memory of S3 actually placing a zero byte object/key into a bucket for the simulated directory/folder in more recent regions (i.e. all but US Standard), whereas the legacy solution for the US Standard region might be different, for example simply based on the established naming convention for directory separation by / and omitting a dedicated object/key for this altogether.

Solution

If the analysis is correct, there is nothing you can do but maintain separate code paths for both cases, I'm afraid

Good luck!



回答3:

I've had the same problem. As a work around you can filter out all the keys with a trailing '/' to eliminate the 'directory' entries.

def files(keys):
    return (key for key in keys if not key.name.endswith('/'))

s3 = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = s3.get_bucket(S3_BUCKET_NAME)
keys = bucket.list(path)
for key in files(keys):
    print(key)


回答4:

I'm using the fact that a "Folder" has no "." in its path. A file does. media/images will not be deleted media/images/sample.jpg will be deleted

e.g. clean bucket files

def delete_all_bucket_files(self,bucket_name):
        bucket = self.get_bucket(bucket_name)
        if bucket:
            for key in bucket.list():
                #delete only the files, not the folders
                if period_char in key.name:
                    print 'deleting: ' + key.name
                    key.delete()