Build custom AWS Lambda layer for Scikit-image

2020-03-04 09:12发布

问题:

Outline: I need to use scikit-image inside some AWS lambda functions, so I'm looking to build a custom AWS lambda layer containing scikit-image.

My questions in general should apply to any python module, notably scikit-learn, or any custom layer in general I think.


Background: After much googling and reading it seems the best way to do that is to use docker to run the AWS lambda runtime locally, and then inside there install/compile scikit-image (or whichever module you're looking for). After that's done, you can upload/install it to AWS as a custom layer.

This is conceptually pretty simple, but I'm struggling a bit with best-practices way to do this. I've got this working, but not sure I'm doing it the best/right/optimal/secure way ... there are million all-slightly-different blog posts about this, and the AWS docs themselves are (IMHO) too detailed but skip over some of the basic questions.

I've been trying to basically follow two good medium posts, here and here ...kudos to those guys.


My main questions are:

  1. Where is the best place to find the latest AWS AMI docker image?

There are multiple (even on amazon itself) multiple locations/versions etc for what is supposedly the latest image. eg https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html, or https://cdn.amazonlinux.com/os-images/2.0.20190823.1/.

..This is ignoring the multitude of non-amazon github hosted possibilities, such as lambci/lambda:build-python3.6 from medium posts here, or onema/amazonlinux4lambda from here.

I'd prefer to use an amazon provided docker image, for both security and up-to-date'ness.

  1. Is the AWS lambda runtime here, which links to this AMI, a docker image? If so (or not) how do you download it to run it locally?
  2. How do you ensure you know when you might need to rebuild a layer, because the AWS lambda runtime is changed by amazon and that breaks you're layer using an older runtime?
  3. Is it better to build (compile in the case of scikit-image) the pip installed module inside of the docker AIM container, or simply just to tell pip to download the pre-built version and hope/trust it will get the compiled libs that are the best for the AMI you're running?

Basically here I'm concerned about stability and performance. I'd like to ensure that the compiled libraries for scikit-image in this case are as optimized as possible for the AMI container.

  1. Is it better to just download and use AWS's SAM to do all of this? (looks like overkill and complicated, but it does look like it takes care of ensuring you're using the 'correct' AMI docker container all the time)
  2. Are there any (good, trustable) repo's of pre-built lambda layers around (that might make all this a moot point)? I looked but couldn't find any.

...thanks for any advice, thoughts and comments!

回答1:

Interesting couple of days figuring this out. ...hopefully the answer below will be some help to anyone struggling to figure out how to make a custom layer (for python but also other languages).


Where is the best place to find the latest AWS AMI docker image?

The answer, as Greg above points out, for where is the "right" docker image to use to build layers is here: lambci/lambda:build-python3.7. That is the official SAM repo for the docker images they use.

The full list for all AWS lambda runtime environments, not just python, is here


What's the best way to build your own AWS lambda layer? ...What's the best way to build a custom python module layer?

The best way I found, to date, is to use AWS's SAM in combination with some tweaks I used from a great blog here.

The tweaks are needed because (at the time I'm writing this) AWS SAM lets you define your layers, but won't actually build them for you. ...See this request from the SAM group's github.

I'm not going to try to explain this in huge detail here - instead please check out the bryson3gps blog. He explains it well, and all the credit to him.*


OK, a quick background on the process to use:

At present, AWS SAM won't build your layer for you.

Meaning, if you define a requirement.txt for a set of modules to install in a layer, it won't actually install/build them into a local directory ready to upload to AWS (as it does if you use it to define a lambda function).

But, if you define a layer in SAM, it will package (zip everything and upload to S3) and deploy (define it within AWS Cloud with ARN etc etc so it can be used) that layer for you.


The way to get SAM to build your layers too

The hack, at present, to "fool" SAM into also building your layer for you, from the bryson3Gps blog here, is to

  1. Define a dummy AWS lambda function template in SAM. Then for that function, make a pip requirement.txt that SAM will use during the build to load the modules you want into your layer. You won't actually use this function for anything.

This entails making a SAM template.yaml file that defines a basic function. Check out the SAM tutorial, then look at bryson3gps' blog. It's pretty easy.

  1. Define an AWS layer in the same template.yaml file. Again not too hard - check out the blog

  2. In the SAM spec's for your layer definition, set ContentUri (ie where it looks for the files/directories to zip and upload to AWS) to the build location for the function you defined in (1).

So, when you use sam build, it will build the function for you (ie process requirements.txt for the function) and put the resulting function packages in a directory to later zip up and send to AWS.

But (this is the key) the layer you defined has it's ContentUri pointing to the same directory sam build used to create the directory for the (dummy) function.

So then, when you tell SAM to package (send to S3) and deploy (configure with AWS) for the template as a whole, it will upload/create the layer that you defined, but it will also use the correct contents for the layer that got built for the (dummy) function.

It works well.

A couple of extra tips

1

In bryson3gps' blog, he points out that this method doesn't put the layers package in the correct location in the lambda AMI directory for them to be found by default (for python that is /opt/python). Instead they are placed in /opt.

His way around this is to add /opt to the sys.path in your lambda scripts prior to importing:

sys.path.append('/opt')
import <a module in your layer>

Instead of doing that, prior to sam package uploading to S3 (after sam build), you can go into the appropriate .aws-sam/<your package subdir> directory and move everything into a new /python directory within that package directory. This results in the layer modules being placed in /opt/python correctly, instead of just /opt.

cd .aws-sam/<wherever you package is>/
mkdir .python
mv * .python
mv .python python

2

If you're making a python layer with compiled code (eg scikit-image that I'm using) make sure you use sam build -u (with the -u flag).

That will make sure the build (pip'ing requirements.txt) will happen inside a docker container matching the AWS lambda runtime, and so will DL the correct lib's) for the runtime.

3

If you're including any modules that depend on numpy or scipy, then after sam build -u, but before package/deploy, make sure you go into the appropriate .aws-sam/<your package> directory that is built and remove the numpy and scipy modules that the dependency will install

cd .aws-sam/<wherever you package is>/
rm -r numpy*
rm -f scipy*

Instead you should specify to use the AWS supplied numpy/scipy layer in your lambda function.

I couldn't find a way to tell SAM to run pip with --no_dep, so have to do this manually



回答2:

I'm not an expert at this, but I happened to have the very same set of questions on the same day. However I can answer question #1 and #2. Taking them out of order:
2) An AMI is not a docker image, its for use in an EC2 instance.

1) Here is how I got the appropriate docker image:

I installed SAM cli and executed the following commands:

sam init --runtime python3.7 (sets up hello world example)
sam build -u (builds app, -u means use a container)

Output from sam build -u:

Fetching lambci/lambda:build-python3.7 Docker container image

So there you go. You can either get the image from dockerhub directly or if you have SAM cli installed, you can execute "sam build -u". Now that you have the image, you don't have to follow the full SAM workflow, if you don't want the overhead.