What's the need for XHTML?

2019-01-17 10:25发布

站内文章 / HTML/CSS

49 0

在下西门庆

女 | 书童

私信

可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

In an interview I was asked a question that I'd never thought about, which was "We already have HTML which fulfills all the requirements of writing a web page, so what's the need for XHTML?"

I Googled a lot and also read many articles, but I'm not able to get properly why XHTML has been introduced. Please explain to me.

回答1:

I am actually writing this to ask why the above three posts which speak about browser-consistence and well formed html have been voted down?

As it is known HTML is a industry standard. Browsers are implemented so that they render the marked up content as described in the HTML standard. Unfortunately there are areas that have not been well defined in HTML: what happens if user forgot a closing tag or what to do if a referred image is not found? some browsers use the 'alt' tag to have a place holder text item and some browsers display the 'alt' tag as a tool tip. The famous 'quirks' mode of browsers is a result of this lack of clarity. Because of this, it became quite possible that the same web page would display differently on different browsers.

Also as HTML usage grew there was one more problem: it was not extensible - there was no way to add user-defined tags.

XHTML solves the above problems:

adopt XML to provide extensible tags.
provide a 'strict' standard for web browsers

XHTML has well defined rules about the structure and these can be programatically enforced. Check the various online "XHTML Validators". They will tell if your XHTML is well formed or not (and highlight the problem areas). Because of these strict rules your page is more or less guaranteed to look the same on all browsers implementing XHTML.

[note] if you want to verify the above, please refer to the text "Head First XHTML and CSS"

回答2:

Because it is valid XML. That helps a lot since you can use a lot of tools originally designed for XML, such as XML parsers, XSLT, XPath, XQuery, ...

Normal HTML is a SGML dialect and that is not parsable without knowledge of the schema.

<ul>
    <li>one
    <li>two
    <li>three
</ul>

is correct HTML but not correct XML. If you want to parse that, you have to know that ul-elements have to be closed but li s don't.

回答3:

XHTML also allows you to embed other XML dialects like MathML, Ruby, SVG, etc. (You can also embed XHTML in other XML dialects, if desired.)

If you are just 'making a web page', you don't necessarily need XHTML. But if you are programmatically generating a page, you might find that the tools for generating XML are better than those that generate HTML.

回答4:

In addition to Johannes answer, HTML is far too loose in its interpretations and tolerance, where XHTML's strict formalisation negates this.

Tolerance leads to variance, which leads to browser incompatibilities, which leads to the dark side.

回答5:

From Wiki:

Because they need to be well-formed, true XHTML documents allow for automated processing to be performed using standard XML tools—unlike HTML, which requires a relatively complex, lenient, and generally custom parser. XHTML can be thought of as the intersection of HTML and XML in many respects, since it is a reformulation of HTML in XML.

Having HTML conform to XML standards allows for a much more consistent parsing of the page. Whereas in HTML, for example, you were allowed to have tags out of order <b><u>test</b></u> now you can't, they must be closed in the order they were opened. Things like this make DOM parsing (which is now used heavily in AJAX) much easier.

回答6:

I am sure you mustve encountered this article from W3.There is a lot to learn from that article. In short XHTML abides the xml rules besides having HTML set of tags. The Most Important Differences:

* XHTML elements must be properly nested
* XHTML elements must always be closed
* XHTML elements must be in lowercase
* XHTML documents must have one root element

回答7:

I see a bunch of up-voted answers here that are making incorrect assumptions about how browsers work. So let me give my 2 cents on the matter.

First of all, why does XHTML exist?

From the horse's mouth:

a two-day workshop was organised to discuss whether a new version of HTML in XML was needed. The opinion at the workshop was a clear 'Yes': with an XML-based HTML other XML languages could include bits of XHTML, and XHTML documents could include bits of other markup languages. We could also take advantage of the redesign to clean up some of the more untidy parts of HTML, and add some new needed functionality, like better forms.

In short, XHTML was created for two reasons:

To allow mixing other content (like mathml and svg) in the same document with clear formatting rules.
To extend and clean up HTML.

Making things easier to validate was not a design goal, and also not something that was necessary because HTML4 validators exist and are comprehensive.

Is XHTML easier to parse for browsers?

Yes and no. XML is easier to parse than HTML tag soup, but, unless you use an xhtml+xml or application/xml mime type for your XHTML page, browsers parse it using the HTML parsing engine. However, if you do use xml mime types, IE chokes on your content. This behavior is explained on the IE blog. There is no difference in how browsers treat XHTML and HTML if you are serving it with a mime type of text/html!

Yes they do! You lie!

Indeed they do, but only because of the doctype. Browsers use doctypes at the top of HTML documents to determine whether they should use standards mode or quirks mode (= bugs mode). All valid XHTML documents happen to include a doctype that triggers standards mode. However, in HTML you can get the same result by including "<!doctype html>" at the top of your page.

So are you saying XHTML has no purpose?

Not at all. XHTML has many advantages:

It can be transformed using XML tools, like XSLT
It can be parsed more easily in server-side code
It can integrate custom markup while still passing a validation test

So, I should use it then?

As always, the answer is "it depends".

Server-side, possibly useful. If you want to have the server-side advantages of XML, you want to be using an XHTML variant, whether that is XHTML1 (HTML4 serialization as XML) or XHTML5 (HTML5 serialization as XML).
Client-side, not useful. I would highly recommend avoiding serving your users an XML mime type. XML parsing doesn't blend with graceful error handling, producing only an "XML parsing error" instead of a document if you have any markup issue in your page. Unless you never write bugs, you will need graceful error handling.

What about HTML5? Does it compete with XHTML?

No it doesn't. HTML5 has two serializations, one as HTML, and one as XML. The benefit is that both now have strict parsing rules. You will get predictable behavior in all browsers regardless of the approach you use. However, HTML5 parsed as HTML has the benefit of graceful error handling. That's why I prefer that approach. As always, YMMV.

回答8:

XHTML is an attempt to encourage the development of "well-formed" HTML.

HTML has evolved over more than 10 years. Its implementation, and the implementation of the browsers that parse and render it, are not exactly consistent. This is why cross-browser compatibility is a major headache.

HTML is based on SGML (Standard Generalized Markup Language.) XML is also derived from SGML, so they are cousins of a sort. XHTML marries the two, providing (in theory) the benefits of XML to HTML. This includes a well-defined schema that can be reliably validated, queried, and transformed.

回答9:

Why was XHTML created?

HTML is not very extensible. XHTML aimed to fix this by introducing namespaces so that languages such as MathML or SVG could be included inline.
XMl is much simpler to parse than SGML (the format used by HTML before version 5)
Due to an overwhelming number of websites with errors, browsers attempted to correct incorrect markup. New browsers have had to attempt to correct it in the same way. XHTML tries to increase standards by specifying that only structurally correct code will display.

How well has it succeeded?

XHTML is widely spread, but almost always served with the text/html MIME type due to incompatibilities with Internet Explorer (up to version 8). Many of these pages would actually break if served as XML. So none of the three advantages above have really materialised.
Many people chose to use XHTML as they thought it would provide better future compatibility. Work has stopped on XHTML2.0 and while HTML5 will have an XHTML serialisation, this seems to be receiving minimal attention. XHTML provides no future compatibility advantages for the forseeable future. Mozilla and Safari recommend using just HTML.
HTML with a strict DTD already has a much cleaner format. HTML5 will take this further by removing the transitional DTD, removing unnecessary elements and defining a standard way for parsing documents with a degree of backwards compatibility. Browsers will still correct errors for the HTML serialisation, rather than forcing the markup to be fixed, but at least they will do it in the same way. Those who care about correct code will use validators anyway.

What is the need for XHTML?

XHTML had laudable goals and maybe it will be able to deliver in the future. I can't recommend XHTML for the possible future advantages it might provide, when HTML is much easier now. You should only really use XHTML if previous code or your tools force you to.

回答10:

I think that it helps browsers correctly display the html without making assumptions about where tags should be closed. Any time a browsers assumes something you know what happens.

回答11:

XHTML forces you to write cleaner code which is easier to maintain, renders more consistently, and easier to hook into the DOM. Comparing XHTML to HTML is like comparing a programming language that is strongly-typed to a programming language that is loosely-typed.

As mentioned above, XHTML allows you to play with SVG and MathML. I'd like to add RDFa to that list. RDFa allows you to add semantics to your content that is not covered by microformats. I've personally been doing a lot with Dublin Core and Friend-of-a-Friend.

回答12:

XHTML is simply about communication between systems. HTML is very difficult to parse, because of the number of variations that can occur as to what is well formed. Since XML is strict in its interpretation, this problem has been removed.

Think about a RESTful architecture. If a URL is permanent location to an item, then systems which would want to access this item should be able to consume the information returned from accessing the URL. XHTML doesn't make this possible per se, because a system could already parse the HTML and retrieve the necessary information. XML just makes this easier. There is no limiting predefined set of tags which make it difficult to classify data in a document (althought techinically you can do this in HTML, because browsers will ignore it). You can use whatever you want to classify what data is retrieved.

回答13:

In a nutshell: XHTML is often only beneficial and preferred over HTML whenever you want to use a XML based tool to manipulate/transform/generate HTML pages on the server side.

Lot of examples can be found in component based MVC frameworks like ~~Sun~~ Oracle JSF which uses Facelets as a XHTML based view technology. The server side components are definied in XSD's and the pages are parsed using a SAX parser. You can even add a <!DOCTYPE html> to top of the page to let Facelets generate "pure" valid and strict HTML5. Microsoft ASP.NET MVC has a similar view technology.

When you're hand-writing HTML, XHTML doesn't add much benefit, or it must be pushing off the "coolness" of using a (over)hyped technology.

回答14:

If i want to crawl your site, and parse its contents, i can only do it if it's XML.

Parsing HTML is a nightmare.

回答15:

XML is a data interchange format - this is perfect for building websites because after all we are dealing with information and this info needs to be crawled and understood by computers (such as search engines).

回答16:

Because XHTML makes a lot more sense!

The point is, even though something might not provide any more technical possibilities, it's still an improvement if it's remade just to be more clear and logical. That's why code refactoring is a good idea even if it doesn't change any of the functionality. That's why Brainfuck wound't be a good programming language, even if it had all the capabilities of Java.

XHTML makes more sense because the underlying structure of tags and their attributes is always consistent - not dependent on the tag semantics. The way it makes more sense is pretty evident, once you get familiar with its difference to HTML, but for example tags are always orderly nested, all tags must close, names must be lowercase, attribute values must have limiting characters around them.