By default, asyncio
runs coroutines synchronously. If they contain blocking IO code, they still wait for it to return. A way around this is loop.run_in_executor()
, which converts the code into threads. If a thread blocks on IO, another thread can start executing. So you don't waste time waiting for IO calls.
If you use asyncio
without executors, you loose those speedups. So I was wondering, why do you have to use executors explicitly. Why not enable them by default ?
(In the following, I'll focus on http requests. But they really only serve as an example. I'm interested in the general principles.)
After some searching I found aiohttp. It's a library that essentially offers a combination of asyncio
and requests
: Non blocking HTTP calls. With executors, asyncio
and requests
behave pretty much just like aiohttp
. Is there a reason to implement a new library, do you pay a performance penalty for using executors?
This question was answered: Why doesn't asyncio always use executors?
Mikhail Gerasimov has explained to me that executors will spin up OS-threads and they can become expensive. So it makes sense not to have them as default behaviour. aiohttp
is better than using the requests
module in an executor, since it offers non-blocking code with only coroutines.
Which brings me to this question. aiohttp advertises itself as :
Asynchronous HTTP Client/Server for asyncio and Python.
So aiohttp
is based on asyncio
? Why doesn't asyncio
offer non-blocking code with only coroutines then? That would be the ideal default.
Or did aiohttp
implement this new event-loop (without OS-threads) itself ?
In that case I don't understand why they advertise themselves as based on asyncio
. Async/await
are a language feature. Asyncio
is an event-loop. And if aiohttp
has its own event-loop there should be little intersection with asyncio
. Actually, I would argue that such an event loop would be a much bigger feature than http requests.
asyncio
is asynchronous because coroutines cooperate voluntarily. Allasyncio
code must be written with cooperation in mind, that's the point entirely. Otherwise you may as well use threading exclusively to achieve concurrency.You can't run 'blocking' functions (non-coroutine functions or methods that won't cooperate) in an executor because you can't just assume that that code can be run in a separate executor thread. Or even if it needs to be run in an executor.
The Python standard library is full of really useful code, that
asyncio
projects will want to make use of. The majority of the standard library consists of regular, 'blocking' function and class definitions. They do their work quickly, so even though they 'block', they return in reasonable time.But most of that code is also not thread-safe, it doesn't need to be usually. But as soon as
asyncio
would run all such code in an executor automatically, then you can't use non-thread-safe functions any more. Besides, creating a thread to run synchronous code in is not free, creating the thread object costs time, and your OS won't let you run an infinite number of threads either. Loads of standard library functions and methods are fast, why would you want to runstr.splitlines()
orurllib.parse.quote()
in a separate thread when it would be much quicker to just execute the code and be done with it?You may say that those functions are not blocking by your standards. You didn't define 'blocking' here, but 'blocking' just means: won't voluntarily yield.. If we narrow this down to won't voluntarily yield when it has to wait for something and the computer could be doing something else instead, then the next question would be how would you detect that it should have yielded?
The answer to that is that you can't.
time.sleep()
is a blocking function where you'd want to yield to the loop for, but that's a C function call. Python can't know thattime.sleep()
is going to block for longer, because a function that callstime.sleep()
will look up the nametime
in the global namespace, and then the attributesleep
on the result of the name lookup, only when actually executing thetime.sleep()
expression. Because Python's namespaces can be altered at any point during execution, you can't know whattime.sleep()
will do until you actually execute the function.You could say that the
time.sleep()
implementation should automatically yield when called then, but then you'd have to start identifying all such functions. And there is no limit to the number of places you'd have to patch and you can't ever know all the places. Certainly not for third-party libraries. For example thepython-adb
project gives you a synchronous USB connection to an Android device, using thelibusb1
library. That's not a standard I/O codepath, so how would Python know that creating and using those connections are good places to yield?So you can't just assume that code needs to be run in an executor, not all code can be run in an executor because it is not thread-safe, and Python can't detect when code is blocking and should really be yielding.
So how do coroutines under
asyncio
cooperate? By using task objects per logical piece of code that needs to run concurrently with other tasks, and by using future objects to signal to the task that the current logical piece of code wants to cede control to other tasks. That's what makes asynchronousasyncio
code asynchronous, voluntarily ceding control. When the loop gives control to one task out of many, the task executes a single 'step' of the coroutine call chain, until that call chain produces a future object, at which point the task adds a wakeup callback to the future object 'done' callback list and returns control to the loop. At some point later, when the future is marked done, the wakeup callback is run and the task will execute another coroutine callchain step.Something else is responsible for marking the future objects as done. When you use
asyncio.sleep()
, a callback to be run at a specific time is given to the loop, where that callback would mark theasyncio.sleep()
future as done. When you use a stream object to perform I/O, then (on UNIX), the loop usesselect
calls to detect when it is time to wake up a future object when the I/O operation is done. And when you use a lock or other synchronisation primitive, then the synchronisation primitive will maintain a pile of futures to mark as 'done' when appropriate (Waiting for a lock? add a future to the pile. Freeing a held lock? Pick the next future from the pile and mark it as done, so the next task that was waiting for the lock can wake up and acquire the lock, etc.).Putting synchronous code that blocks into an executor is just another form of cooperation here. When using
asyncio
in a project, it is up to the developer to make sure that you use the tools given to you to make sure your coroutines cooperate. You are free to use blockingopen()
calls on files instead of using streams, and you are free to use an executor when you know the code needs to be run in a separate thread to avoid blocking too long.Last but not least, the whole point of using
asyncio
is to avoid using threading as much as possible. Using threads has downsides; code needs to be thread-safe (control can switch between threads anywhere, so two threads accessing a shared piece of data should do so with care, and 'taking care' can mean that the code is slowed down). Threads execute no matter if they have anything to do or not; switching control between a fixed number of threads that all wait for I/O to happen is a waste of CPU time, where theasyncio
loop is free to find a task that is not waiting.Yes, it builds on asyncio's abstractions such as futures, transports and protocols, synchronization primitives, and so on.
If you use asyncio APIs, that's exactly what it does. It offers non-blocking code to connect to a server, resolve a host name, create a server, and even run blocking code in a separate thread pool without blocking the event loop.
aiohttp uses all this functionality to implement a capable http client and server on top of asyncio.
No, aiohttp hooks into asyncio's event loop. More precisely, the application that uses aiohttp spins up the asyncio event loop and hooks aiohttp (and other asyncio-based libraries) into it.
Async/await are a language feature, like generators. Asyncio is a library that uses them, like itertools. There are other libraries that use coroutines, e.g. curio and trio.