I have S3 access only to a specific directory in an S3 bucket.
For example, with the s3cmd
command if I try to list the whole bucket:
$ s3cmd ls s3://my-bucket-url
I get an error: Access to bucket 'my-bucket-url' was denied
But if I try access a specific dir in the bucket, I can see the contents:
$ s3cmd ls s3://my-bucket-url/dir-in-bucket
Now I want to connect to the S3 bucket with python boto. Similary with:
bucket = conn.get_bucket('my-bucket-url')
I get an error: boto.exception.S3ResponseError: S3ResponseError: 403 Forbidden
But if I try:
bucket = conn.get_bucket('my-bucket-url/dir-in-bucket')
The script stalls for about 10 seconds, and prints out an error afterwards. Bellow is the full trace. Any idea how to proceed with this?
Traceback (most recent call last):
File "test_s3.py", line 7, in <module>
bucket = conn.get_bucket('my-bucket-url/dir-name')
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 471, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 490, in head_bucket
response = self.make_request('HEAD', bucket_name, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/boto/s3/connection.py", line 633, in make_request
retry_handler=retry_handler
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 1046, in make_request
retry_handler=retry_handler)
File "/usr/local/lib/python2.7/dist-packages/boto/connection.py", line 922, in _mexe
request.body, request.headers)
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 776, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 1157, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 553, in create_connection
for res in getaddrinfo(host, port, 0, SOCK_STREAM):
socket.gaierror: [Errno -2] Name or service not known
By default, when you do a get_bucket
call in boto it tries to validate that you actually have access to that bucket by performing a HEAD
request on the bucket URL. In this case, you don't want boto to do that since you don't have access to the bucket itself. So, do this:
bucket = conn.get_bucket('my-bucket-url', validate=False)
and then you should be able to do something like this to list objects:
for key in bucket.list(prefix='dir-in-bucket'):
<do something>
If you still get a 403 Errror, try adding a slash at the end of the prefix.
for key in bucket.list(prefix='dir-in-bucket/'):
<do something>
For boto3
import boto3
s3 = boto3.resource('s3')
my_bucket = s3.Bucket('my_bucket_name')
for object_summary in my_bucket.objects.filter(Prefix="dir_name/"):
print(object_summary.key)
Boto3 client:
import boto3
_BUCKET_NAME = 'mybucket'
_PREFIX = 'subfolder/'
client = boto3.client('s3', aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
def ListFiles(client):
"""List files in specific S3 URL"""
response = client.list_objects(Bucket=_BUCKET_NAME, Prefix=_PREFIX)
for content in response.get('Contents', []):
yield content.get('Key')
file_list = ListFiles(client)
for file in file_list:
print 'File found: %s' % file
Using session
from boto3.session import Session
_BUCKET_NAME = 'mybucket'
_PREFIX = 'subfolder/'
session = Session(aws_access_key_id=ACCESS_KEY,
aws_secret_access_key=SECRET_KEY)
client = session.client('s3')
def ListFilesV1(client, bucket, prefix=''):
"""List files in specific S3 URL"""
paginator = client.get_paginator('list_objects')
for result in paginator.paginate(Bucket=bucket, Prefix=prefix,
Delimiter='/'):
for content in result.get('Contents', []):
yield content.get('Key')
file_list = ListFilesV1(client, _BUCKET_NAME, prefix=_PREFIX)
for file in file_list:
print 'File found: %s' % file
The following code will list all the files in specific dir of the S3 bucket:
import boto3
s3 = boto3.client('s3')
def get_all_s3_keys(s3_path):
"""
Get a list of all keys in an S3 bucket.
:param s3_path: Path of S3 dir.
"""
keys = []
if not s3_path.startswith('s3://'):
s3_path = 's3://' + s3_path
bucket = s3_path.split('//')[1].split('/')[0]
prefix = '/'.join(s3_path.split('//')[1].split('/')[1:])
kwargs = {'Bucket': bucket, 'Prefix': prefix}
while True:
resp = s3.list_objects_v2(**kwargs)
for obj in resp['Contents']:
keys.append(obj['Key'])
try:
kwargs['ContinuationToken'] = resp['NextContinuationToken']
except KeyError:
break
return keys
If you want to list all the objects of a folder in your bucket, you can specify it while listing.
import boto
conn = boto.connect_s3(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(AWS_BUCKET_NAME)
for file in bucket.list("FOLDER_NAME/", "/"):
<do something with required file>