How to deploy large python packages with AWS Lambd

2020-03-30 08:36发布

问题:

I need some advice.

I trained an image classifier using Tensorflow and wanted to deploy it to AWS Lambda using serverless. The directory includes the model, some python modules including tensorflow and numpy, and python code. The size of the complete folder before unzipping is 340 MB, which gets rejected by AWS lambda with an error message saying "The unzipped state must be smaller than 262144000 bytes".

How should I approach this? Can I not deploy packages like these on AWS Lambda?

Note: In the requirements.txt file, there are 2 modules listed including numpy and tensorflow. (Tensorflow is a big module)

回答1:

I know I am answering it very late .. just putting it here for reference for other people.. I did the following things -

  1. Delete /external/* /tensorflow/contrib/* /tensorflow/include/unsupported/* files as suggested here.
  2. Strip all .so files especially two files in site-packages/numpy/core - _multiarray_umath.cpython-36m-x86_64-linux-gnu.so and _multiarray_tests.cpython-36m-x86_64-linux-gnu.so. Strip considerably reduces their size.
  3. You can put your model in S3 bucket and download it at runtime. This will reduce the size of the zip. This is explained in detail here.

If this does not work then there are some additional things that can be done like removing pyc files etc as mentioned here



回答2:

You can maybe use the ephemeral disk capacity, (/tmp) that have a limit of 512Mb, but in your case, memory will still be an issue.

The best choice can be to use an AWS batch, if serverless does not manage it, you can even keep a lambda to trigger your batch



回答3:

The best way to do it would be to use the Serverless Framework as outlined in this article. It helps to zip them using a docker image which mimics Amazon's linux environment. Additionally, it automatically uses S3 as the code repository for your Lambda which increases the size limit. The article provided is an extremely helpful guide and is the same way that developers use tensorflow and other large libraries on AWS.

If you're still running into the 250MB size limit, you can try to follow this article which uses the same python-requirements-plugin as the previous article, but with the option -slim: true. This will help you to optimally compress your packages by removing unnecessary files from them, which allows you to decrease your package size before AND after unzipping.