How to load a pickle file from S3 to use in AWS La

2020-02-21 08:40发布

问题:

I am currently trying to load a pickled file from S3 into AWS lambda and store it to a list (the pickle is a list).

Here is my code:

import pickle
import boto3

s3 = boto3.resource('s3')
with open('oldscreenurls.pkl', 'rb') as data:
    old_list = s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)

I get the following error even though the file exists:

FileNotFoundError: [Errno 2] No such file or directory: 'oldscreenurls.pkl'

Any ideas?

回答1:

As shown in the documentation for download_fileobj, you need to open the file in binary write mode and save to the file first. Once the file is downloaded, you can open it for reading and unpickle.

import pickle
import boto3

s3 = boto3.resource('s3')
with open('oldscreenurls.pkl', 'wb') as data:
    s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)

with open('oldscreenurls.pkl', 'rb') as data:
    old_list = pickle.load(data)

download_fileobj takes the name of an object in S3 plus a handle to a local file, and saves the contents of that object to the file. There is also a version of this function called download_file that takes a filename instead of an open file handle and handles opening it for you.

In this case it would probably be better to use S3Client.get_object though, to avoid having to write and then immediately read a file. You could also write to an in-memory BytesIO object, which acts like a file but doesn't actually touch a disk. That would look something like this:

import pickle
import boto3
from io import BytesIO

s3 = boto3.resource('s3')
with BytesIO() as data:
    s3.Bucket("pythonpickles").download_fileobj("oldscreenurls.pkl", data)
    data.seek(0)    # move back to the beginning after writing
    old_list = pickle.load(data)


回答2:

Super simple solution

import pickle
import boto3

s3 = boto3.resource('s3')
my_pickle = pickle.loads(s3.Bucket("bucket_name").Object("key_to_pickle.pickle").get()['Body'].read())


回答3:

This is the easiest solution. You can load the data without even downloading the file locally using S3FileSystem

from s3fs.core import S3FileSystem
s3_file = S3FileSystem()

data = pickle.load(s3_file.open('{}/{}'.format(bucket_name, file_path)))