Beautifulsoup功能不是在特定情况下正常工作(Beautifulsoup function

2019-10-18 01:39发布

我想在下面的URL中使用的urllib2阅读： http://frcwest.com/然后搜索数据的元重定向。

它读取以下数据：

   <!--?xml version="1.0" encoding="UTF-8"?--><!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
   <html xmlns="http://www.w3.org/1999/xhtml"><head><title></title><meta content="0;url= Home.html" http-equiv="refresh"/></head><body></body></html>

读入Beautifulsoup工作正常。然而，对于没有的功能的一些原因，适用于这个特定senarious，我不明白为什么。 Beautifulsoup已在所有其他情况对我来说真是棒极了。但是，如果只是想：

    soup.findAll('meta')

产生任何结果。

我的最终目标是运行：

    soup.find("meta",attrs={"http-equiv":"refresh"})

但是，如果：

    soup.findAll('meta')

甚至没有工作，然后我卡。任何煽动这个谜团将不胜感激，谢谢！

Answer 1:

这是在这里抛出解析器的注释和文档类型，随后，BeautifulSoup。

即使是HTML标签似乎“水涨船高”：

>>> soup.find('html') is None
True

然而，它的存在在.contents迭代依然。你可以再次找到的东西：

for elem in soup:
    if getattr(elem, 'name', None) == u'html':
        soup = elem
        break

soup.find_all('meta')

演示：

>>> for elem in soup:
...     if getattr(elem, 'name', None) == u'html':
...         soup = elem
...         break
... 
>>> soup.find_all('meta')
[<meta content="0;url= Home.html" http-equiv="refresh"/>]

文章来源: Beautifulsoup functionality not working properly in specific scenario