Parsing HTML in python - lxml or BeautifulSoup? Wh-第2页回答

Parsing HTML in python - lxml or BeautifulSoup? Wh

2019-01-03 05:10发布

From what I can make out, the two main HTML parsing libraries in Python are lxml and BeautifulSoup. I've chosen BeautifulSoup for a project I'm working on, but I chose it for no particular reason other than finding the syntax a bit easier to learn and understand. But I see a lot of people seem to favour lxml and I've heard that lxml is faster.

So I'm wondering what are the advantages of one over the other? When would I want to use lxml and when would I be better off using BeautifulSoup? Are there any other libraries worth considering?

标签： python beautifulsoup html-parsing lxml

7条回答

不美不萌又怎样

2楼-- · 2019-01-03 05:39

For sure i would use EHP. It is faster than lxml, much more elegant and simpler to use.

Check out. https://github.com/iogf/ehp

<body ><em > foo  <font color="red" ></font></em></body>


from ehp import *

data = '''<html> <body> <em> Hello world. </em> </body> </html>'''

html = Html()
dom = html.feed(data)

for ind in dom.find('em'):
    print ind.text()

Output:

Hello world.

0人赞添加讨论(0) 举报

上一页 1 2

Parsing HTML in python - lxml or BeautifulSoup? Wh

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间