Timeout for python requests.get entire response

2020-01-24 01:46发布

I'm gathering statistics on a list of websites and I'm using requests for it for simplicity. Here is my code:

data=[]
websites=['http://google.com', 'http://bbc.co.uk']
for w in websites:
    r= requests.get(w, verify=False)
    data.append( (r.url, len(r.content), r.elapsed.total_seconds(), str([(l.status_code, l.url) for l in r.history]), str(r.headers.items()), str(r.cookies.items())) )

Now, I want requests.get to timeout after 10 seconds so the loop doesn't get stuck.

This question has been of interest before too but none of the answers are clean. I will be putting some bounty on this to get a nice answer.

I hear that maybe not using requests is a good idea but then how should I get the nice things requests offer. (the ones in the tuple)

19条回答
姐就是有狂的资本
2楼-- · 2020-01-24 02:23

timeout = (connection timeout, data read timeout) or give a single argument(timeout=1)

import requests

try:
    req = requests.request('GET', 'https://www.google.com',timeout=(1,1))
    print(req)
except requests.ReadTimeout:
    print("READ TIME OUT")
查看更多
可以哭但决不认输i
3楼-- · 2020-01-24 02:24

If it comes to that, create a watchdog thread that messes up requests' internal state after 10 seconds, e.g.:

  • closes the underlying socket, and ideally
  • triggers an exception if requests retries the operation

Note that depending on the system libraries you may be unable to set deadline on DNS resolution.

查看更多
爱情/是我丢掉的垃圾
4楼-- · 2020-01-24 02:26

To create a timeout you can use signals.

The best way to solve this case is probably to

  1. Set an exception as the handler for the alarm signal
  2. Call the alarm signal with a ten second delay
  3. Call the function inside a try-except-finally block.
  4. The except block is reached if the function timed out.
  5. In the finally block you abort the alarm, so it's not singnaled later.

Here is some example code:

import signal
from time import sleep

class TimeoutException(Exception):
    """ Simple Exception to be called on timeouts. """
    pass

def _timeout(signum, frame):
    """ Raise an TimeoutException.

    This is intended for use as a signal handler.
    The signum and frame arguments passed to this are ignored.

    """
    # Raise TimeoutException with system default timeout message
    raise TimeoutException()

# Set the handler for the SIGALRM signal:
signal.signal(signal.SIGALRM, _timeout)
# Send the SIGALRM signal in 10 seconds:
signal.alarm(10)

try:    
    # Do our code:
    print('This will take 11 seconds...')
    sleep(11)
    print('done!')
except TimeoutException:
    print('It timed out!')
finally:
    # Abort the sending of the SIGALRM signal:
    signal.alarm(0)

There are some caveats to this:

  1. It is not threadsafe, signals are always delivered to the main thread, so you can't put this in any other thread.
  2. There is a slight delay after the scheduling of the signal and the execution of the actual code. This means that the example would time out even if it only slept for ten seconds.

But, it's all in the standard python library! Except for the sleep function import it's only one import. If you are going to use timeouts many places You can easily put the TimeoutException, _timeout and the singaling in a function and just call that. Or you can make a decorator and put it on functions, see the answer linked below.

You can also set this up as a "context manager" so you can use it with the with statement:

import signal
class Timeout():
    """ Timeout for use with the `with` statement. """

    class TimeoutException(Exception):
        """ Simple Exception to be called on timeouts. """
        pass

    def _timeout(signum, frame):
        """ Raise an TimeoutException.

        This is intended for use as a signal handler.
        The signum and frame arguments passed to this are ignored.

        """
        raise Timeout.TimeoutException()

    def __init__(self, timeout=10):
        self.timeout = timeout
        signal.signal(signal.SIGALRM, Timeout._timeout)

    def __enter__(self):
        signal.alarm(self.timeout)

    def __exit__(self, exc_type, exc_value, traceback):
        signal.alarm(0)
        return exc_type is Timeout.TimeoutException

# Demonstration:
from time import sleep

print('This is going to take maximum 10 seconds...')
with Timeout(10):
    sleep(15)
    print('No timeout?')
print('Done')

One possible down side with this context manager approach is that you can't know if the code actually timed out or not.

Sources and recommended reading:

查看更多
来,给爷笑一个
5楼-- · 2020-01-24 02:26

Try this request with timeout & error handling:

import requests
try: 
    url = "http://google.com"
    r = requests.get(url, timeout=10)
except requests.exceptions.Timeout as e: 
    print e
查看更多
Root(大扎)
6楼-- · 2020-01-24 02:29

What about using eventlet? If you want to timeout the request after 10 seconds, even if data is being received, this snippet will work for you:

import requests
import eventlet
eventlet.monkey_patch()

with eventlet.Timeout(10):
    requests.get("http://ipv4.download.thinkbroadband.com/1GB.zip", verify=False)
查看更多
放我归山
7楼-- · 2020-01-24 02:32

Set the timeout parameter:

r = requests.get(w, verify=False, timeout=10) # 10 seconds

As long as you don't set stream=True on that request, this will cause the call to requests.get() to timeout if the connection takes more than ten seconds, or if the server doesn't send data for more than ten seconds.

查看更多
登录 后发表回答