I have 4 machine learning models
of size 2GB
each, i.e. 8GB
in total. I am getting requests around 100 requests
at a time. Each request is taking around 1sec
.
I have a machine having 15GB RAM
. Now if I increase the number of workers
in Gunicorn, total memory consumption go high. So I can't increase the number of workers beyond 2.
So I have few questions regarding it :
- How workers can
share models or memory
between them? - Which type of worker will be suitable,
sync or async
considering mentioned situation? - How to use
preload
option inGunicorn
if it is a solution? I used it but it is of no help. May be I am doing it in a wrong way.
Here is the Flask code which I am using
https://github.com/rathee/learnNshare/blob/master/agent_api.py