When downloading a large file with python, I want to put a time limit not only for the connection process, but also for the download.
I am trying with the following python code:
import requests
r = requests.get('http://ipv4.download.thinkbroadband.com/1GB.zip', timeout = 0.5, prefetch = False)
print r.headers['content-length']
print len(r.raw.read())
This does not work (the download is not time limited), as correctly noted in the docs: https://requests.readthedocs.org/en/latest/user/quickstart/#timeouts
This would be great if it was possible:
r.raw.read(timeout = 10)
The question is, how to put a time limit to the download?
And the answer is: do not use requests, as it is blocking. Use non-blocking network I/O, for example eventlet:
Produces expected results:
Run download in a thread which you can then abort if not finished on time.
When using Requests'
prefetch=False
parameter, you get to pull in arbitrary-sized chunks of the respone at a time (rather than all at once).What you'll need to do is tell Requests not to preload the entire request and keep your own time of how much you've spent reading so far, while fetching small chunks at a time. You can fetch a chunk using
r.raw.read(CHUNK_SIZE)
. Overall, the code will look something like this:Note that this might sometimes use a bit more than the 5 seconds allotted as the final
r.raw.read(...)
could lag an arbitrary amount of time. But at least it doesn't depend on multithreading or socket timeouts.