python urllib2.urlopen(url) process block

2019-04-17 09:53发布

问题:

I am using urllib2.urlopen() and my process is getting blocked

I am aware that urllib2.urlopen() has default timeout.

How to make the call unblockable?

The backtrace is

(gdb) bt 
#0 0x0000003c6200dc35 in recv () from /lib64/libpthread.so.0 
#1 0x00002b88add08137 in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so 
#2 0x00002b88add0830e in ?? () from /usr/lib64/python2.6/lib-dynload/_socketmodule.so 
#3 0x000000310b2d8e19 in PyEval_EvalFrameEx () from /usr/lib64/libpython2.6.so.1.0

回答1:

If your problem is that you need to urllib to finish reading

read() operation is blocking operation in Python.

If you want to create asynchronous requests

  • Do reading in non-main thread http://docs.python.org/library/threading.html

  • Use requestslibrary and asynchronous requests http://docs.python-requests.org/en/latest/user/advanced/#asynchronous-requests

If your problem is need to set timeout

Again, use requests library as mentioned above.



回答2:

You can try using strace (or similar) tool to figure out what the actual system call is that is blocking your python script, e.g on linux: $ strace python yourscript.py

yourscript.py:

from urllib2 import urlopen
urlopen("http://somesite.local/foobar.html")

$ strace python yourscript.py

... lots of system call stripped ...
socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 3
connect(3, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("127.0.0.1")}, 16