Differentiating between XHTML and HTML with PHP DO

2019-07-05 06:09发布

问题:

I want to manipulate HTML and XHTML documents with the PHP DOM implementation. I use the DOMDocument->loadHTML() method to load the content.

In want to know if the loaded content is either XHTML or HTML. DOMDocument has a doctype object which contains the DOCTYPE declaration from the document itself. So far I thought about comparing $dom->doctype->publicId which contains strings like "-//W3C//DTD HTML 4.01//ENtext/html"

Is there any better way anyone can think of?

Edit:

Sorry if my question was a bit unclear. I updated the question since it might have been confusing. But to make it clear now: This question is not about handling HTML with PHP DOM in general or whether XHTML is good or bad.

回答1:

If you're loading from an external source, you can check the file's MIME type and see if it's application/xhtml+xml; if it is, it's most definitely XHTML (of course it can lie and serve with that type, but with horribly malformed markup). Otherwise if it's text/html then it'll be parsed as HTML tag soup. Validity of the actual markup aside, the doctype declaration is your next best way of telling whether the content is (or claims to be) HTML or XHTML.

Like you say, you can check the public identifier and/or the URI and determine the type from there.



标签: php html dom xhtml