urllib2的一个网站在浏览器里面显示正常返回404(urllib2 returns 404 fo

2019-06-27 22:49发布

我不能够使用的urllib2打开一个特定的URL。 同样的方法与其他网站,如“http://www.google.com”但不是这个网站(在浏览器中也显示正常),效果很好。

我简单的代码:

from BeautifulSoup import BeautifulSoup
import urllib2

url="http://www.experts.scival.com/einstein/"
response=urllib2.urlopen(url)
html=response.read()
soup=BeautifulSoup(html)
print soup

谁能帮我做工作?

这是我的错误了:

Traceback (most recent call last):
  File "/Users/jontaotao/Documents/workspace/MedicalSchoolInfo/src/AlbertEinsteinCollegeOfMedicine_SciValExperts/getlink.py", line 12, in <module>
    response=urllib2.urlopen(url);
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
    return _opener.open(url, data, timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 432, in error
    result = self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 619, in http_error_302
    return self.parent.open(new, timeout=req.timeout)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 400, in open
    response = meth(req, response)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 513, in http_response
    'http', request, response, code, msg, hdrs)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 438, in error
    return self._call_chain(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 372, in _call_chain
    result = func(*args)
  File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 521, in http_error_default
    raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found

谢谢

Answer 1:

我只是尝试这样做,并获得404码和页面回。

在猜测它做的User-Agent检测其意外或故意不提供内容到Python的urllib。

澄清,用urllib ,我已收到urlopen返回的响应对象与404代码和HTML内容。 随着urllib2.urlopenurllib2.HTTPError异常发生。

我建议你试试你的用户代理设置的东西,看起来像一个浏览器。 有一个关于这个在这里一个问题: 在urllib2.urlopen更改用户代理



Answer 2:

您可以使用try except捕获错误

try:
    u = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.code
    print e.msg
    return


Answer 3:

陛下...你确定URL是否有效? 试试“http://www.google.com”我有类似的代码,并没有与urllib的没有问题。 或者你可以使用try - except语句看到错误的详细信息。 当然MattH的答案是非常相似的真理:)



文章来源: urllib2 returns 404 for a website which displays fine in browsers