I'm gathering statistics on a list of websites and I'm using requests for it for simplicity. Here is my code:
data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
r= requests.get(w, verify=False)
data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )
Now, I want requests.get
to timeout after 10 seconds so the loop doesn't get stuck.
This question has been of interest before too but none of the answers are clean. I will be putting some bounty on this to get a nice answer.
I hear that maybe not using requests is a good idea but then how should I get the nice things requests offer. (the ones in the tuple)
Well, I tried many solutions on this page and still faced instabilities, random hangs, poor connections performance.
I'm now using Curl and i'm really happy about it's "max time" functionnality and about the global performances, even with such a poor implementation :
Here, I defined a 6 seconds max time parameter, englobing both connection and transfer time.
I'm sure Curl has a nice python binding, if you prefer to stick to the pythonic syntax :)
I came up with a more direct solution that is admittedly ugly but fixes the real problem. It goes a bit like this:
You can read the full explanation here
this code working for socketError 11004 and 10060......
Just another one solution (got it from http://docs.python-requests.org/en/master/user/advanced/#streaming-uploads)
Before upload you can find out the content size:
But be careful, a sender can set up incorrect value in the 'content-length' response field.
There is a package called timeout-decorator that you can use to time out any python function.
It uses the signals approach that some answers here suggest. Alternatively, you can tell it to use multiprocessing instead of signals (e.g. if you are in a multi-thread environment).
timeout = int(seconds)
Since
requests >= 2.4.0
, you can use thetimeout
argument ofrequests
, i.e:Note: