Best way to parse an invalid HTML in PHP

2020-02-04 02:37发布

Is there a better approach to parse an invalid HTML then applying Tidy on it?

Side Note : There are some situation when you can't have Tidy available. Regexp is also not recommended I understood for parsing html.

标签: php html parsing
2条回答
时光不老,我们不散
2楼-- · 2020-02-04 02:53

SimpleHTMLDOM is known to be more lenient than PHP's native DOM functions.

查看更多
仙女界的扛把子
3楼-- · 2020-02-04 03:06

I would try something like this: http://php.net/manual/en/domdocument.loadhtml.php

From that page:

The function parses the HTML contained in the string source. Unlike loading XML, HTML does not have to be well-formed to load. This function may also be called statically to load and create a DOMDocument object.

查看更多
登录 后发表回答