I am trying to serve long running requests using gunicorn and its async workers but I can't find any examples that I can get to work. I used the example here but tweaked to add a fake delay (sleep for 5s) before returning the response:
def app(environ, start_response):
data = "Hello, World!\n"
start_response("200 OK", [
("Content-Type", "text/plain"),
("Content-Length", str(len(data)))
])
time.sleep(5)
return iter([data])
Then I run gunicorn so:
gunicorn -w 4 myapp:app -k gevent
When I open up two browser tabs and type in http://127.0.0.1:8000/
in both of them and send the requests almost at the same time, the requests appear to get processed sequentially - one returns after 5 seconds and the other returns after a further 5 seconds.
Q. I am guessing the sleep isn't gevent friendly? But there are 4 workers and so even if the type of worker was 'sync' two workers should handle two requests simultaneously?
I just ran into the same thing, opened a question here: Requests not being distributed across gunicorn workers . The result is, it appears that the browser serializes access to the same page. I'm guessing perhaps this has something to do w/ cacheability, i.e. the browser thinks it's likely the page is cacheable, wait until it loads finds out it isn't so it makes another request and so on.
When using gunicorn with non-blocking worker type, like gevent, It will use ONLY ONE process dealing with requests, so it's no surprise that your 5-second work carried out sequentially.
The async worker is useful when your workload is light, and request rate is rapid, in that case, gunicorn can utilize times wasted on waiting IO (like, waiting for socket to be writable to write the response to it), by switching to another worker to work another request. by switching to another request assigned to the same worker.
UPDATE
I was wrong.
When using gunicorn with non-blocking worker type, with worker settings in gunicorn, each worker is a process, that runs a separate queue.
So if the time.sleep
was ran on different process, it will run simultaneously, but when it's ran in the same worker, it will be carried out sequentially.
The problem is that the gunicorn loadbalancer may not have distributed the two requests into two worker processes. You can check the current process by os.getpid()
.
Give gevent.sleep
a shot instead of time.sleep
.
It's weird that this is happening with -w 4
, but -k gevent
is an async worker type, so it's possible gunicorn is feeding both requests to the same client. Assuming that's what's happening, time.sleep
will lock your process unless you use gevent.monkey.patch_all()
.