urllib2.urlopen will hang forever despite of timeo

2019-03-21 14:17发布

问题:

Hope this is quite a simple question, but it's driving me crazy. I'm using Python 2.7.3 on an out of the box installation of ubuntu 12.10 server. I kept zooming on the problem until I got to this snippet:

import urllib2
x=urllib2.urlopen("http://casacinema.eu/movie-film-Matrix+trilogy+123+streaming-6165.html", timeout=5)

It simply hangs forever, never goes on timeout. I'm evidently doing something wrong. Anybody could please help? Thank you very much indeed!

Matteo

回答1:

Looks like you are experiencing the proxy issue. Here's a great explanation on how to workaround it: Trying to access the Internet using urllib2 in Python.

I've executed your code on my ubuntu with python 2.7.3 and haven't seen any errors.

Also, consider using requests:

import requests

response = requests.get("http://casacinema.eu/movie-film-Matrix+trilogy+123+streaming-6165.html", timeout=5)
print response.status_code

See also:

  • Proxies with Python 'Requests' module


回答2:

The original poster stated they did not understand why it would hang, but they also wanted a way to keep urllib.request.urlopen from hanging. I can not say how to keep it from hanging but if it helps someone this is why it can hang.

The Python-urllib/3.6 client is picky. It expects, for example, the server to return HTTP/1.1 200 OK not HTTP 200 OK. It also expects the server to close the connection when it sends connection: close in the headers.

The best way to diagnose this is to get the raw output of the server response and compare it with another server response that you know works. Then, if you must create a server and manipulate the response to determine exactly what difference is the cause. Perhaps, that can lead at least to change on the server and allow it to not hang.