Guido van Rossum, in his speech in 2014 on Tulip/Asyncio shows the slide:
Tasks vs coroutines
And I'm completely missing the point.
From my point of view both constructs are identical:
In case of bare coroutine - It gets scheduled, so the task is created anyways, because scheduler operates with Tasks, then coroutine caller coroutine is suspended until callee is done and then becomes free to continue execution.
In case of Task
- All the same - new task is schduled and caller coroutine waits for its completion.
What is the difference in the way that code executed in both cases and what impact it has that developer should consider in practice?
p.s.
Links to authoritative sources (GvR, PEPs, docs, core devs notes) will be very appreciated.
For the calling side co-routine yield from coroutine()
feels like a function call (i.e. it will again gain control when coroutine() finishes).
yield from Task(coroutine())
on the other hand feels more like creating a new thread. Task()
returns almost instantly and very likely the caller gains control back before the coroutine()
finishes.
The difference between f()
and th = threading.Thread(target=f, args=()); th.start(); th.join()
is obvious, right?
The point of using asyncio.Task(coro())
is for cases where you don't want to explicitly wait for coro
, but you want coro
to be executed in the background while you wait for other tasks. That is what Guido's slide means by
[A] Task
can make progress without waiting for it...as long as you wait
for something else
Consider this example:
import asyncio
@asyncio.coroutine
def test1():
print("in test1")
@asyncio.coroutine
def dummy():
yield from asyncio.sleep(1)
print("dummy ran")
@asyncio.coroutine
def main():
test1()
yield from dummy()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
dummy ran
As you can see, test1
was never actually executed, because we didn't explicitly call yield from
on it.
Now, if we use asyncio.async
to wrap a Task
instance around test1
, the result is different:
import asyncio
@asyncio.coroutine
def test1():
print("in test1")
@asyncio.coroutine
def dummy():
yield from asyncio.sleep(1)
print("dummy ran")
@asyncio.coroutine
def main():
asyncio.async(test1())
yield from dummy()
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
Output:
in test1
dummy ran
So, there's really no practical reason for using yield from asyncio.async(coro())
, since it's slower than yield from coro()
without any benefit; it introduces the overhead of adding coro
to the internal asyncio
scheduler, but that's not needed, since using yield from
guarantees that coro
is going to execute, anyway. If you just want to call a coroutine and wait for it to finish, just yield from
the coroutine directly.
Side note:
I'm using asyncio.async
* instead of Task
directly because the docs recommend it:
Don’t directly create Task
instances: use the async()
function or
the BaseEventLoop.create_task()
method.
* Note that as of Python 3.4.4, asyncio.async
is deprecated in favor of asyncio.ensure_future
.
As described in PEP 380, the accepted PEP document that introduced yield from, the expression res = yield from f()
comes from the idea of the following loop:
for res in f():
yield res
With this, things become very clear: if f()
is some_coroutine()
, then the coroutine is executed. On the other hand, if f()
is Task(some_coroutine())
, Task.__init__
is executed instead. some_coroutine()
is not executed, only the newly created generator is passed as the first argument to Task.__init__
.
Conclusion:
res = yield from some_coroutine()
=> coroutine continues execution and returns the next value
res = yield from Task(some_coroutine())
=> a new task is created, which stores a non-executed some_coroutine()
generator object.