I've just started to experiment with AWS SageMaker and would like to load data from an S3 bucket into a pandas dataframe in my SageMaker python jupyter notebook for analysis.
I could use boto to grab the data from S3, but I'm wondering whether there is a more elegant method as part of the SageMaker framework to do this in my python code?
Thanks in advance for any advice.
In the simplest case you don't need
boto3
, because you just read resources.Then it's even simpler:
But as Prateek stated make sure to configure your SageMaker notebook instance. to have access to s3. This is done at configuration step in Permissions > IAM role
Do make sure the Amazon SageMaker role has policy attached to it to have access to S3. It can be done in IAM.
You can also use AWS Data Wrangler https://github.com/awslabs/aws-data-wrangler:
If you have a look here it seems you can specify this in the InputDataConfig. Search for "S3DataSource" (ref) in the document. The first hit is even in Python, on page 25/26.
You could also access your bucket as your file system using
s3fs