Python asyncio/aiohttp: What are the requirements

2019-05-18 12:09发布

问题:

The python documentation for connection_lost states:

connection_made() and connection_lost() are called exactly once per successful connection.

Further down there's also the following state machine:

start -> connection_made() [-> data_received() *] [-> eof_received() ?] -> connection_lost() -> end

Also, the documentation for BaseTransport.close() states:

After all buffered data is flushed, the protocol’s connection_lost() method will be called with None as its argument.

and the documentation for WriteTransport.abort() states:

The protocol’s connection_lost() method will eventually be called with None as its argument.

This seems to me to indicate the following responsibilities:

  1. The transport must, if it has called connection_made(), later also call connection_lost() on the protocol (regardless of whether the connection is lost because of a call to close(), a call to abort() or an issue with the underlying connection).
  2. The protocol must not assume that I/O has finished when a call to close() or abort() returns. It must wait for the call to connection_lost(). In particular, after close() or abort() returns, there may be work relating to the transport still scheduled on the event loop.

With that in mind, consider the following trivial aiohttp client program, using SSL:

import aiohttp
import asyncio

async def main():
    conn = aiohttp.TCPConnector(verify_ssl=False)
    async with aiohttp.ClientSession(connector=conn) as session:
        async with session.post('https://whatevs/') as resp:
            resp.raise_for_status()

loop = asyncio.get_event_loop()
try:
    loop.run_until_complete(main())
finally:
    loop.close()

Running this on my (windows) machine appears to work correctly. However, if I put breakpoints or print statements into the connection_made() and connection_lost() methods of aiohttp's ResponseHandler class (a protocol implementation), I see that connection_made() is called but connection_lost() is not.

The transport used is _SSLProtocolTransport, defined in asyncio's sslproto.py file. Its close() method is called, and it sets off a shutdown process. Due to the nature of SSL this shutdown process is necessarily asynchronous, and the expectation appears to be that once the shutdown is complete the SSLProtocol underlying the _SSLProtocolTransport would, from its _finalize() method, close its underlying transport. This would then cause a call to connection_lost to bubble up the stack. However, none of this asynchronous stuff actually happens. aiohttp appears to just call close() and immediately discard the _SSLProtocolTransport (the method where it does this is not even a coroutine), and the transport never progresses with its shutdown sequence and never calls connection_lost().

So my question is: is this a bug in aiohttp and/or aysncio's SSL protocol/transport, or am I misinterpreting the documentation as regards the responsitilities of the transport and protocol?

Why I'm Asking This

The reason for this question is that I have written an SSL transport of my own, to allow me to use PyOpenSSL with asyncio, instead of the standard library ssl module. In my implementation, after the call to my close() method returns, there are still callbacks queued on the event loop (scheduled with call_soon()). This is necessary in order for the asynchronous shutdown sequence to be performed correctly, and I expect the protocol to give my transport a chance to complete the process and call connection_lost().

When I use my transport with aiohttp, the __aexit__ method of the ClientSession created in the code above calls its own close() method (not a coroutine), which causes my transport to be closed, without waiting for connection_lost(). The event loop is then closed and the module finalised while the transport is still alive and performing I/O, resulting in a variety of errors.

I'm trying to figure out whether this is my fault or a bug in aiohttp (and perhaps also asyncio's SSL transport). If it's my fault, I need to know how I'm supposed to perform this asynchronous shutdown. I could in principle handle it at the top level by running the event loop until it's empty before calling loop.close(), but I don't see any way to do that (there's Task.all_tasks() but that doesn't work for things scheduled with call_soon). Even if I can do that somehow, it would seem exceptionally ugly and is certainly not described as a standard requirement for shutting down after such work in any documentation I've seen for asyncio or aiohttp.

回答1:

I suggest you to create an issue in aiohttp bug tracker and copy your question into it. IMHO Stack Overflow is not the best place for discussing questions like this.