HTML parser in Python

Using the Python Documentation I found the HTML parser but I have no idea which library to import to use it, how do I find this out (bearing in mind it doesn't say on the page).

标签： python import

8条回答

小情绪 Triste *

2楼-- · 2019-02-09 18:25

Try:

import HTMLParser

In Python 3.0, the HTMLParser module has been renamed to html.parser you can check about this here

Python 3.0

import html.parser

Python 2.2 and above

import HTMLParser

0人赞添加讨论(0) 举报

仙女界的扛把子

3楼-- · 2019-02-09 18:28

You probably really want BeautifulSoup, check the link for an example.

But in any case

>>> import HTMLParser
>>> h = HTMLParser.HTMLParser()
>>> h.feed('<html></html>')
>>> h.get_starttag_text()
'<html>'
>>> h.close()

0人赞添加讨论(0) 举报

小情绪 Triste *

4楼-- · 2019-02-09 18:34

For real world HTML processing I'd recommend BeautifulSoup. It is great and takes away much of the pain. Installation is easy.

0人赞添加讨论(0) 举报

做个烂人

5楼-- · 2019-02-09 18:35

You may be interested in lxml. It is a separate package and has C components, but is the fastest. It has also very nice API, allowing you to easily list links in HTML documents, or list forms, sanitize HTML, and more. It also has capabilities to parse not well-formed HTML (it's configurable).

0人赞添加讨论(0) 举报

ゆ、 Hurt°

6楼-- · 2019-02-09 18:39

I would recommend using Beautiful Soup module instead and it has good documentation.

0人赞添加讨论(0) 举报

甜甜的少女心

7楼-- · 2019-02-09 18:39

You should also look at html5lib for Python as it tries to parse HTML in a way that very much resembles what web browsers do, especially when dealing with invalid HTML (which is more than 90% of today's web).

0人赞添加讨论(0) 举报

1 2 下一页

HTML parser in Python

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间