In earlier question, one of authors of aiohttp
kindly suggested way to fetch multiple urls with aiohttp using the new async with
syntax from Python 3.5
:
import aiohttp
import asyncio
async def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return await response.text()
async def fetch_all(session, urls, loop):
results = await asyncio.wait([loop.create_task(fetch(session, url))
for url in urls])
return results
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# breaks because of the first url
urls = ['http://SDFKHSKHGKLHSKLJHGSDFKSJH.com',
'http://google.com',
'http://twitter.com']
with aiohttp.ClientSession(loop=loop) as session:
the_results = loop.run_until_complete(
fetch_all(session, urls, loop))
# do something with the the_results
However when one of the session.get(url)
requests breaks (as above because of http://SDFKHSKHGKLHSKLJHGSDFKSJH.com
) the error is not handled and the whole thing breaks.
I looked for ways to insert tests about the result of session.get(url)
, for instance looking for places for a try ... except ...
, or for a if response.status != 200:
but I am just not understanding how to work with async with
, await
and the various objects.
Since async with
is still very new there are not many examples. It would be very helpful to many people if an asyncio
wizard could show how to do this. After all one of the first things most people will want to test with asyncio
is getting multiple resources concurrently.
Goal
The goal is that we can inspect the_results
and quickly see either:
- this url failed (and why: status code, maybe exception name), or
- this url worked, and here is a useful response object
I would use gather
instead of wait
, which can return exceptions as objects, without raising them. Then you can check each result, if it is instance of some exception.
import aiohttp
import asyncio
async def fetch(session, url):
with aiohttp.Timeout(10):
async with session.get(url) as response:
return await response.text()
async def fetch_all(session, urls, loop):
results = await asyncio.gather(
*[fetch(session, url) for url in urls],
return_exceptions=True # default is false, that would raise
)
# for testing purposes only
# gather returns results in the order of coros
for idx, url in enumerate(urls):
print('{}: {}'.format(url, 'ERR' if isinstance(results[idx], Exception) else 'OK'))
return results
if __name__ == '__main__':
loop = asyncio.get_event_loop()
# breaks because of the first url
urls = [
'http://SDFKHSKHGKLHSKLJHGSDFKSJH.com',
'http://google.com',
'http://twitter.com']
with aiohttp.ClientSession(loop=loop) as session:
the_results = loop.run_until_complete(
fetch_all(session, urls, loop))
Tests:
$python test.py
http://SDFKHSKHGKLHSKLJHGSDFKSJH.com: ERR
http://google.com: OK
http://twitter.com: OK
I am far from an asyncio expert but you want to catch the error you need to catch a socket error:
async def fetch(session, url):
with aiohttp.Timeout(10):
try:
async with session.get(url) as response:
print(response.status == 200)
return await response.text()
except socket.error as e:
print(e.strerror)
Running the code and printing the_results:
Cannot connect to host sdfkhskhgklhskljhgsdfksjh.com:80 ssl:False [Can not connect to sdfkhskhgklhskljhgsdfksjh.com:80 [Name or service not known]]
True
True
({<Task finished coro=<fetch() done, defined at <ipython-input-7-535a26aaaefe>:5> result='<!DOCTYPE ht...y>\n</html>\n'>, <Task finished coro=<fetch() done, defined at <ipython-input-7-535a26aaaefe>:5> result=None>, <Task finished coro=<fetch() done, defined at <ipython-input-7-535a26aaaefe>:5> result='<!doctype ht.../body></html>'>}, set())
You can see we get catch the error and the further calls are still successful returning the html.
We should probably really be catching an OSError as socket.error is A deprecated alias of OSError since python 3.3:
async def fetch(session, url):
with aiohttp.Timeout(10):
try:
async with session.get(url) as response:
return await response.text()
except OSError as e:
print(e)
If you want to also check the response is 200, put your if in the try too and you can use the reason attribute to get more info:
async def fetch(session, url):
with aiohttp.Timeout(10):
try:
async with session.get(url) as response:
if response.status != 200:
print(response.reason)
return await response.text()
except OSError as e:
print(e.strerror)