How to get a concurrency of 1000 requests with Fla

2019-04-08 13:10发布

I have 4 machine learning models of size 2GB each, i.e. 8GB in total. I am getting requests around 100 requests at a time. Each request is taking around 1sec.
I have a machine having 15GB RAM. Now if I increase the number of workers in Gunicorn, total memory consumption go high. So I can't increase the number of workers beyond 2.
So I have few questions regarding it :

How workers can share models or memory between them?
Which type of worker will be suitable, sync or async considering mentioned situation?
How to use preload option in Gunicorn if it is a solution? I used it but it is of no help. May be I am doing it in a wrong way.

Here is the Flask code which I am using
https://github.com/rathee/learnNshare/blob/master/agent_api.py

标签： concurrency flask machine-learning gunicorn

1条回答

SAY GOODBYE

2楼-- · 2019-04-08 13:26

Use the gevent worker (or another event loop worker), not the default worker. The default sync worker handles one request per worker process. An async worker handles an unlimited number of requests per worker process as long as each request is non-blocking.

gunicorn -k gevent myapp:app

Predictably, you need to install gevent for this: pip install gevent.

0人赞添加讨论(0) 举报

How to get a concurrency of 1000 requests with Fla

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间