Check if html tag is self-closing - HTMLparser - P

2019-08-29 02:25发布

Is there a way to check if a tag is a self-closing tag with HTMLparser?

I know self-closing tags are handled by the built-in function: handle_startendtag()

However, it only handles them if they are explicitely closed..eg <img src="x.jpg"/>

and not: <img src="x.jpg">

I am making a program that takes an html file and spits out a sass template.

I want to close these img tags in the output file that are not explicitly closed in the html file.

Cheers

2条回答
Viruses.
2楼-- · 2019-08-29 03:10

Not exactly a Python-specific solution, but if you want to know which tags have this "self-closing property", you can look at the official HTML5 specs: these are formally known as void elements.

area, base, br, col, embed, hr, img, input, keygen, link, menuitem,
meta, param, source, track, wbr

Strictly speaking, void elements do not have closing tags at all, but permit an extra / immediately before the >.

查看更多
Evening l夕情丶
3楼-- · 2019-08-29 03:10

Simple solution is to use BeautifulSoup.

In [76]: from bs4 import BeautifulSoup

In [77]: BeautifulSoup('<img src="x.jpg">')
Out[77]: <img src="x.jpg"/>

You can also check if a tag is self closing or not.

from bs4 import BeautifulSoup
from bs4.element import Tag

soup = BeautifulSoup(html)
tags = [tag for tag in soup if isinstacne(tag, Tag)
self_closing = [tag for tag in tags if tag.isSelfClosing]

Every Tag element has isSelfClosing property. So, you can filter them out.

查看更多
登录 后发表回答