What problem does XHTML strict solve?

2019-03-09 19:28发布

问题:

I really don't understand the fascination with XHTML strict. Inline JavaScript typically requires a rats nest of escapes to make it compatible with XHTML and semi-backwards compatible with MSIE 5 & 6. Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters. It just seems like more effort then its worth. Nevermind that almost every developer I've worked along side of keeps forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/xhtml+xml.

Wish I knew the name of the blogger, but someone else pointed out that a majority of supposedly XHTML compliant websites and open source packages are actually not because of that last issue, forgetting to set the content-type header correctly.

I'm looking to understand why XHTML is useful, or build enough of an arsenal of arguments to prevent it ever being used in future projects that I have influence on.

回答1:

XHTML1 vs HTML4 and Strict vs Transitional are completely orthogonal issues.

XML might not give any huge advantage to browsers today, but on the server end it's an order of magnitude easier to process documents using XML than trying to parse the mess that is old-school-SGML-except-not-really HTML4.

Restricting yourself to [X]HTML Strict doesn't achieve anything in itself, other than simply that it discourages the use of old, less-maintainable techniques you shouldn't be using anyway.

Inline javascript typically requires a rats nest of escapes to make it compatible with XHTML

You can get away without any escapes as long as you don't use the characters < or &. And ‘// < [CDATA[’ isn't really much worse than ‘< !--’ was in the old days.

In any case, keeping the scripting external is much more manageable; you don't want to be doing anything significant inline.

Then there is the issue of not being OCD enough on user input to make sure you don't miss any illegal characters.

Out-of-band characters are exactly as invalid in HTML4 Transitional as in XHTML1 Strict.

If you're accepting user-submitted HTML and not checking/escaping it with enough of a fine tooth comb to prevent well-formedness errors you have much bigger problems than just complying with a doctype. You'll be letting injection hacks through and making your site vulnerable to cross-site-scripting security holes.

forgetting to ensure the content-type returned from the server is reset for XHTML pages from text/html to application/html+xml.

It's not ‘forgetting’, it's deliberate: there is not really that much point in serving application/xhtml+xml today. To account for IE you have to sniff UA, and then make sure you understand the CSS and JavaScript differences that pop up in both parsing modes... you can do it to prove your technical prowess, but it doesn't really get you anything.

Serving XHTML as legacy HTML may not be ideal, but it lets you keep the simpler, more processable syntax of XML (and potential interoperability with other XML languages like SVG) whilst still being browser-friendly.

People complain about the pickiness of the well-formedness errors, but having those errors picked up straight away for you to fix them is way better than leaving them there silently, ready to trip up some future browser.



回答2:

there is a great post about the usage of XHTML @ Beware of XHTML.

Hope it helps, Bruno Figueiredo



回答3:

XHTML 1.0 Strict tries to solve four problems:

  1. XML is W3C technology and HTML4 wasn’t using it. Not your problem.

  2. Strict seeks to be more theoretically pure than Transitional when it comes to presentationalism. But this is not an XHTML vs. HTML issue.

  3. XML parser is supposedly simpler. (Not entirely true; the code for dealing with the DTD part is pretty complex.) These days, you get both XML and HTML parsers off-the-shelf, so this isn’t your problem. (Aside: the mobile argument is utterly bogus.)

  4. application/xhtml+xml (though not valid XHTML 1.0 Strict!) allows you to mix other vocabularies. If you want to use inline MathML or SVG today, this is the main reason to use application/xhtml+xml today. However, the direction the HTML5 work is taking is making it possible to use MathML and SVG in text/html.



回答4:

XHTML is useful because it's much easier to create a simple transforming stylesheet or roll your own parser for it, than it is for HTML.



回答5:

Do you have to parse your HTML with a program, o for some tests? Then, use XHTML.

For everything else, HTML 4.01 (strict, loose, transitional, whatever) is perfectly "standard" and less "troublesome".



回答6:

XHTML enables you to advanced rendering like SVG (Scalable Vector Graphics), which itself is an XML, but can easily be embedded in XHTML through the XML namespace extension without <embed> or <object>. Unfortunately, only Firefox and Safari does support it. Sorry IE6 users.

For more on SVG at http://en.wikipedia.org/wiki/Svg



回答7:

XHTML makes HTML orthogonal with all the other xml-based structures in our universe, which has two primary benefits.

Design patterns we use in dealing with xml can be applied to html.

Software tools ditto.



回答8:

XHTML has the advantages of xml. But then why the strict variant?

I see some similarities with deprecated functions. You can still use them this version, but they are possibly removed the next version. So I see the transitional version as deprecated use. It still works and it will work for a couple of versions, but if you want to build for the future, use the strict version.



回答9:

Strict is intended to formalize the separation between content and style by making it more difficult to commingle the two. Elliotte Rusty Harold has a good write up on XHTML in one of his books, here's the relevant excerpt on 'Why XHTML'.



回答10:

The only thing I've seen solved by XHTML is the "problem" of users using Safari: I don't know if the bug is still there, but when we were last asked to write in XHTML, we ran across a bug that made XHTML unusable with Safari. In XHTML, the following URL isn't allowed in anchor tags, because the ampersand isn't escaped:

http://www.example.com/page.php?arg1=val1&arg2=val2

so what you have to do is replace it with &amp; like this:

http://www.example.com/page.php?arg1=val1&amp;arg2=val2

but Safari converts &amp; to &#38; so you get this URL:

http://www.example.com/page.php?arg1=val1&#38;arg2=val2

...and the hash symbol ends the URL as far as PHP is concerned. I know that there are ugly hacks that allow you to pass two variables in other ways, but if XHTML is going to force you to use ugly hacks, then you're better off without it.



回答11:

Personally, I liked the concept of XHTML: much cleaner than most HTML we can see, easier to parse and validate. Like everybody, I started to code XHTML pages. BTW, I don't see an issue with inline JavaScript, no need for escapes if you put the code in CDATA. And IE5 is fortunately a bit out of the browser landscape, like Netscape 4 which forced us to write / > instead of />, thing I still see in pure XML sometime...

Now, I have read a number of articles, like the one linked by Bruno, which has lot of good arguments against its use in most cases. Basically, it says most browsers aren't just ready for strict XHTML (served as XML), it doesn't make much sense to server XHTML as HTML, and anyway it isn't that useful in the majority of sites.

Look at the arguments above: they are perfectly valid, and it is great to be able to put MathML or SVG directly in the page, to transform XML with an XSLT parser, to process the page with an XML parser.

But how often do you do that? Parsing the page is most often the problem of end users, which can use a good HTML parser. And given the number of browsers able to manage MathML, SVG or XSLT, it is more a need for intranet than for the vast Internet.

You can have an e-commerce or a blog or a forum, which spits out good XHTML pages. And the persons writing the descriptions, articles or messages insert <p><p><p> to skip some lines, when it isn't <p/> or some other exotic construct...

I believe in XHTML, but I think I will no longer use it for the little pages I do for my site. I will use HTML 4 with well written code (quoted attributes, closing tags even if optional, etc.).
And after all, if W3C is working in HTML 5, it is for a reason: HTML has still a live ahead, otherwise it would have been killed in favor of XHTML 2.



回答12:

XHTML is by definition XML, unlike HTML.

This means you can do funky useful stuff with it, such as easily validate and parse it (since you know it's XML and thus can use the myriad of tools available).

Also, geeks like to make things "more correct" ;-)



回答13:

This is a global standard issue

This is not just about xHTML, but about all the standards in the world. You need to make things clearer, from version to version.

xHTML is square and pushes coders to add semantic value to the code. It's fully XML compatible and therefor more easyly parseable, stylisable, etc.

Remember that a code is not just for coders, bot for machines too. In 10 years, people creating browsers or libraries won't want to implement the same complexes rules for old HTML processing but will rather expect something as clean as possible.

Search engine needs something to rely on to build semantic links between value and so it's better if there is only one easy way to do it.

And I am not talking about screen readers...

Standard, is above all, about going toward one unique open solution that fit everybody's need. Not just about adding new shiny features.