asyncio: why isn't it non-blocking by default

2019-08-08 19:21发布

By default, asyncio runs coroutines synchronously. If they contain blocking IO code, they still wait for it to return. A way around this is loop.run_in_executor(), which converts the code into threads. If a thread blocks on IO, another thread can start executing. So you don't waste time waiting for IO calls.

If you use asyncio without executors, you loose those speedups. So I was wondering, why do you have to use executors explicitly. Why not enable them by default ? (In the following, I'll focus on http requests. But they really only serve as an example. I'm interested in the general principles.)

After some searching I found aiohttp. It's a library that essentially offers a combination of asyncio and requests: Non blocking HTTP calls. With executors, asyncio and requests behave pretty much just like aiohttp. Is there a reason to implement a new library, do you pay a performance penalty for using executors?

This question was answered: Why doesn't asyncio always use executors? Mikhail Gerasimov has explained to me that executors will spin up OS-threads and they can become expensive. So it makes sense not to have them as default behaviour. aiohttp is better than using the requests module in an executor, since it offers non-blocking code with only coroutines.

Which brings me to this question. aiohttp advertises itself as :

Asynchronous HTTP Client/Server for asyncio and Python.

So aiohttp is based on asyncio? Why doesn't asyncio offer non-blocking code with only coroutines then? That would be the ideal default.

Or did aiohttp implement this new event-loop (without OS-threads) itself ? In that case I don't understand why they advertise themselves as based on asyncio. Async/await are a language feature. Asyncio is an event-loop. And if aiohttp has its own event-loop there should be little intersection with asyncio. Actually, I would argue that such an event loop would be a much bigger feature than http requests.

2条回答
Viruses.
2楼-- · 2019-08-08 19:51

asyncio is asynchronous because coroutines cooperate voluntarily. All asyncio code must be written with cooperation in mind, that's the point entirely. Otherwise you may as well use threading exclusively to achieve concurrency.

You can't run 'blocking' functions (non-coroutine functions or methods that won't cooperate) in an executor because you can't just assume that that code can be run in a separate executor thread. Or even if it needs to be run in an executor.

The Python standard library is full of really useful code, that asyncio projects will want to make use of. The majority of the standard library consists of regular, 'blocking' function and class definitions. They do their work quickly, so even though they 'block', they return in reasonable time.

But most of that code is also not thread-safe, it doesn't need to be usually. But as soon as asyncio would run all such code in an executor automatically, then you can't use non-thread-safe functions any more. Besides, creating a thread to run synchronous code in is not free, creating the thread object costs time, and your OS won't let you run an infinite number of threads either. Loads of standard library functions and methods are fast, why would you want to run str.splitlines() or urllib.parse.quote() in a separate thread when it would be much quicker to just execute the code and be done with it?

You may say that those functions are not blocking by your standards. You didn't define 'blocking' here, but 'blocking' just means: won't voluntarily yield.. If we narrow this down to won't voluntarily yield when it has to wait for something and the computer could be doing something else instead, then the next question would be how would you detect that it should have yielded?

The answer to that is that you can't. time.sleep() is a blocking function where you'd want to yield to the loop for, but that's a C function call. Python can't know that time.sleep() is going to block for longer, because a function that calls time.sleep() will look up the name time in the global namespace, and then the attribute sleep on the result of the name lookup, only when actually executing the time.sleep() expression. Because Python's namespaces can be altered at any point during execution, you can't know what time.sleep() will do until you actually execute the function.

You could say that the time.sleep() implementation should automatically yield when called then, but then you'd have to start identifying all such functions. And there is no limit to the number of places you'd have to patch and you can't ever know all the places. Certainly not for third-party libraries. For example the python-adb project gives you a synchronous USB connection to an Android device, using the libusb1 library. That's not a standard I/O codepath, so how would Python know that creating and using those connections are good places to yield?

So you can't just assume that code needs to be run in an executor, not all code can be run in an executor because it is not thread-safe, and Python can't detect when code is blocking and should really be yielding.

So how do coroutines under asyncio cooperate? By using task objects per logical piece of code that needs to run concurrently with other tasks, and by using future objects to signal to the task that the current logical piece of code wants to cede control to other tasks. That's what makes asynchronous asyncio code asynchronous, voluntarily ceding control. When the loop gives control to one task out of many, the task executes a single 'step' of the coroutine call chain, until that call chain produces a future object, at which point the task adds a wakeup callback to the future object 'done' callback list and returns control to the loop. At some point later, when the future is marked done, the wakeup callback is run and the task will execute another coroutine callchain step.

Something else is responsible for marking the future objects as done. When you use asyncio.sleep(), a callback to be run at a specific time is given to the loop, where that callback would mark the asyncio.sleep() future as done. When you use a stream object to perform I/O, then (on UNIX), the loop uses select calls to detect when it is time to wake up a future object when the I/O operation is done. And when you use a lock or other synchronisation primitive, then the synchronisation primitive will maintain a pile of futures to mark as 'done' when appropriate (Waiting for a lock? add a future to the pile. Freeing a held lock? Pick the next future from the pile and mark it as done, so the next task that was waiting for the lock can wake up and acquire the lock, etc.).

Putting synchronous code that blocks into an executor is just another form of cooperation here. When using asyncio in a project, it is up to the developer to make sure that you use the tools given to you to make sure your coroutines cooperate. You are free to use blocking open() calls on files instead of using streams, and you are free to use an executor when you know the code needs to be run in a separate thread to avoid blocking too long.

Last but not least, the whole point of using asyncio is to avoid using threading as much as possible. Using threads has downsides; code needs to be thread-safe (control can switch between threads anywhere, so two threads accessing a shared piece of data should do so with care, and 'taking care' can mean that the code is slowed down). Threads execute no matter if they have anything to do or not; switching control between a fixed number of threads that all wait for I/O to happen is a waste of CPU time, where the asyncio loop is free to find a task that is not waiting.

查看更多
Lonely孤独者°
3楼-- · 2019-08-08 19:52

So aiohttp is based on asyncio?

Yes, it builds on asyncio's abstractions such as futures, transports and protocols, synchronization primitives, and so on.

Why doesn't asyncio offer non-blocking code with only coroutines then?

If you use asyncio APIs, that's exactly what it does. It offers non-blocking code to connect to a server, resolve a host name, create a server, and even run blocking code in a separate thread pool without blocking the event loop.

aiohttp uses all this functionality to implement a capable http client and server on top of asyncio.

Or did aiohttp implement this new event-loop (without OS-threads) itself ?

No, aiohttp hooks into asyncio's event loop. More precisely, the application that uses aiohttp spins up the asyncio event loop and hooks aiohttp (and other asyncio-based libraries) into it.

Async/await are a language feature. Asyncio is an event-loop.

Async/await are a language feature, like generators. Asyncio is a library that uses them, like itertools. There are other libraries that use coroutines, e.g. curio and trio.

查看更多
登录 后发表回答