I've noticed a difference between the returns from boto's api depending on the bucket location. I have the following code:
con = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = con.get_bucket(S3_BUCKET_NAME)
keys = bucket.list(path)
for key in keys:
print key
which im running against two buckets, one in us-west and one in ireland. Path in this bucket is a sub-directory, against Ireland I get the sub directory and any keys underneath, against us-west I only get the keys beneath.
So Ireland gives:
<Key: <bucketName>,someDir/>
<Key: <bucketName>,someDir/someFile.jpg>
<Key: <bucketName>,someDir/someOtherFile.jpg>
where as US Standard gives:
<Key: <bucketName>,someDir/someFile.jpg>
<Key: <bucketName>,someDir/someOtherFile.jpg>
Obviously, I want to be able to write the same code regardless of bucket location. Anyone know of anything I can do to work around this so I get the same predictable results. Or even if it's boto causing the problem or S3. I noticed there is a different policy for naming buckets in Ireland, do different locals have their own version of the api's?
Thanks,
Steve
Thanks to Steffen, who suggested looking at how the keys are created. With further investigation I think I've got a handle on whats happening here. My original suposition that it was linked to the bucket region was a red herring. It appears to be due to what the management console does when you manipulate keys.
If you create a directory in the management console it creates a 0 byte key. This will be returned when you perform a list.
If you use boto to create/upload a file then it doesn't create the folder. Interestingly, if you delete the file from within the folder (from the AWS console) then a key is created for the folder that used to contain the key. If you then upload the bey again using boto, then you have exactly the same looking structure from the UI, but infact you have a spurious additional key for the directory. This is what was happening to me, as I was testing our application I was clearing out keys and then finding different results.
Worth knowing this happens. There is no indicator in the UI to show if a folder is a created one (one that will be returned as a key) or an interpreted one (based on a keys name).
I don't have a definite answer for your question, but can throw in some partial ones at least:
Background
Directory/Folder simulation
Amazon S3 doesn't actually have a native concept of folders/directories, rather is a flat storage architecture comprised of buckets and objects/keys only - the directory style presentation seen in most tools for S3 (including the AWS Management Console itself) is based solely on convention, i.e. simulating a hierarchy for objects with identical prefixes - see my answer to How to specify an object expiration prefix that doesn't match the directory? for more details on this architecture, including quotes/references from the AWS documentation.
API differences per region
I noticed there is a different policy for naming buckets in Ireland,
do different locals have their own version of the api's?
That's apparently the case indeed for Amazon S3 specifically, which is one of their oldest offerings, see e.g. Bucket Restrictions and Limitations:
In all regions except for the US Standard region, You must use the
following guidelines when naming a bucket. [...] [emphasis mine]
These specifics for the US Standard region are seen in other places of the S3 documentation as well, and US Standard is an unusual construct itself compared to the otherwise clearly geographically constrained Regions:
US Standard — Uses Amazon S3 servers in the United States
This is the default Region. The US Standard Region automatically
routes requests to facilities in Northern Virginia or the Pacific
Northwest using network maps. To use this region, select US Standard
as the region when creating a bucket in the console. The US Standard
Region provides eventual consistency for all requests. [emphasis mine]
This implicit CDN behavior is unique for this default Region of S3 (i.e. US Standard) and not seen elsewhere on any other AWS service I think.
Likely Cause
I have a faint memory of S3 actually placing a zero byte object/key into a bucket for the simulated directory/folder in more recent regions (i.e. all but US Standard), whereas the legacy solution for the US Standard region might be different, for example simply based on the established naming convention for directory separation by /
and omitting a dedicated object/key for this altogether.
Solution
If the analysis is correct, there is nothing you can do but maintain separate code paths for both cases, I'm afraid
Good luck!
I've had the same problem. As a work around you can filter out all the keys with a trailing '/'
to eliminate the 'directory' entries.
def files(keys):
return (key for key in keys if not key.name.endswith('/'))
s3 = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = s3.get_bucket(S3_BUCKET_NAME)
keys = bucket.list(path)
for key in files(keys):
print(key)
I'm using the fact that a "Folder" has no "." in its path.
A file does.
media/images will not be deleted
media/images/sample.jpg will be deleted
e.g. clean bucket files
def delete_all_bucket_files(self,bucket_name):
bucket = self.get_bucket(bucket_name)
if bucket:
for key in bucket.list():
#delete only the files, not the folders
if period_char in key.name:
print 'deleting: ' + key.name
key.delete()