I have code for reading an url like this:
from urllib2 import Request, urlopen
req = Request(url)
for key, val in headers.items():
req.add_header(key, val)
res = urlopen(req, timeout = timeout)
# This line blocks
content = res.read()
The timeout works for the urlopen() call. But then the code gets to the res.read() call where I want to read the response data and the timeout isn't applied there. So the read call may hang almost forever waiting for data from the server. The only solution I've found is to use a signal to interrupt the read() which is not suitable for me since I'm using threads.
What other options are there? Is there a HTTP library for Python that handles read timeouts? I've looked at httplib2 and requests and they seem to suffer the same issue as above. I don't want to write my own nonblocking network code using the socket module because I think there should already be a library for this.
Update: None of the solutions below are doing it for me. You can see for yourself that setting the socket or urlopen timeout has no effect when downloading a large file:
from urllib2 import urlopen
url = 'http://iso.linuxquestions.org/download/388/7163/http/se.releases.ubuntu.com/ubuntu-12.04.3-desktop-i386.iso'
c = urlopen(url)
c.read()
At least on Windows with Python 2.7.3, the timeouts are being completely ignored.
Any asynchronous network library should allow to enforce the total timeout on any I/O operation e.g., here's gevent code example:
And here's asyncio equivalent:
The test http server is defined here.
Had the same issue with socket timeout on the read statement. What worked for me was putting both the urlopen and the read inside a try statement. Hope this helps!
I'd expect this to be a common problem, and yet - no answers to be found anywhere... Just built a solution for this using timeout signal:
The credit for the signal part of the solution goes here btw: python timer mystery
This isn't the behavior I see. I get a
URLError
when the call times out:Can't you catch this error and then avoid trying to read
res
? When I try to useres.read()
after this I getNameError: name 'res' is not defined.
Is something like this what you need:I suppose the way to implement a timeout manually is via
multiprocessing
, no? If the job hasn't finished you can terminate it.pycurl.TIMEOUT
option works for the whole request:The code raises the timeout error in ~2 seconds. I've tested the total read timeout with the server that sends the response in multiple chunks with the time less than the timeout between chunks:
where
slow_http_server.py
:I've tested the total connection timeout with
http://google.com:22222
.One possible (imperfect) solution is to set the global socket timeout, explained in more detail here:
However, this only works if you're willing to globally modify the timeout for all users of the socket module. I'm running the request from within a Celery task, so doing this would mess up timeouts for the Celery worker code itself.
I'd be happy to hear any other solutions...