可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm just playing around and I'm trying to grab information from websites. Unfortunately, with the following code:

import sys
import socket
import re
from urlparse import urlsplit

url = urlsplit(sys.argv[1])


sock = socket.socket()
sock.connect((url[0] + '://' + url[1],80))
path = url[2]
if not path:
    path = '/'

print path
sock.send('GET ' + path + ' HTTP/1.1\r\n'
    + 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US) AppleWebKit/525.19 (KHTML, like Gecko) Chrome/0.3.154.9 Safari/525.19\r\n'
    + 'Accept: */*\r\n'
    + 'Accept-Language: en-US,en\r\n'
    + 'Accept-Charset: ISO-8859-1,*,utf-8\r\n'
    + 'Host: 68.33.143.182\r\n'
    + 'Connection: Keep-alive\r\n'
    + '\r\n')

I get the following error:

Traceback (most recent call last):
File "D:\Development\Python\PyCrawler\PyCrawler.py", line 10, in sock.connect((url[0] + '://' + url[1],80)) File "", line 1, in connect socket.gaierror: (11001, 'getaddrinfo failed')

The only time I do not get an error is if the url passed is http://www.reddit.com. Every other url I have tried comes up with the socket.gaierror. Can anyone explain this? And possibly give a solution?

回答1:

Please please please please please please please don't do this.

urllib and urllib2 are your friends.

Read the "missing" urllib2 manual if you are having trouble with it.

回答2:

sock.connect((url[0] + '://' + url[1],80))

Do not do that, instead do this:

sock.connect((url[1], 80))

connect expects a hostname, not a URL.

Actually, you should probably use something higher-level than sockets to do HTTP. Maybe httplib.

回答3:

Have you ever altered your Hosts file? If it has an entry for Reddit but not much else, that might explain that site's unique result.

回答4:

you forgot to resolve the hostname:

addr = socket.gethostbyname(url[1])
...
sock.connect((addr,80))

回答5:

Use urllib2. Or BeautifulSoup.

(Python) socket.gaierror on every addres…except ht

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

收藏的人(0)

(Python) socket.gaierror on every addres…except ht

问题:

回答1:

回答2:

回答3:

回答4:

回答5:

收藏的人(0)

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮