Do I really need to encode '&' as '&am

2018-12-31 17:29发布

I'm using an '&' symbol with HTML5 and UTF-8 in my site's <title>. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles.

http://validator.w3.org is giving me this:

& did not start a character reference. (& probably should have been escaped as &amp;.)

Do I really need to do &amp;?

I'm not fussed about my pages validating for the sake of validating, but I'm curious to hear people's opinions on this and if it's important and why.

17条回答
公子世无双
2楼-- · 2018-12-31 18:23

In HTML a & marks the begin of a reference, either of a character reference or of an entity reference. From that point on the parser expects either a # denoting a character reference, or an entity name denoting an entity reference, both followed by a ;. That’s the normal behavior.

But if the reference name or just the reference opening & is followed by a white space or other delimiters like ", ', <, >, &, the ending ; and even a reference to represent a plain & can be omitted:

<p title="&amp;">foo &amp; bar</p>
<p title="&amp">foo &amp bar</p>
<p title="&">foo & bar</p>

Only in these cases the ending ; or even the reference itself can be omitted (at least in HTML 4). I think HTML 5 requires the ending ;.

But the specification recommends to always use a reference like the character reference &#38; or the entity reference &amp; to avoid confusion:

Authors should use "&amp;" (ASCII decimal 38) instead of "&" to avoid confusion with the beginning of a character reference (entity reference open delimiter). Authors should also use "&amp;" in attribute values since character references are allowed within CDATA attribute values.

查看更多
人间绝色
3楼-- · 2018-12-31 18:24

Well, if it comes from user input then absolutely yes, for obvious reasons. Think if this very website didn't do it: the title of this question would show up as do i really need to encode ‘&’ as ‘&’?

If it's just something like echo '<title>Dolce & Gabbana</title>'; then strictly speaking you don't have to. It would be better, but if you don't no user will notice the difference.

查看更多
低头抚发
4楼-- · 2018-12-31 18:26

A couple of years ago, we got a report that one of our web apps wasn't displaying correctly in Firefox. It turned out that the page contained a tag that looked like

<div style="..." ... style="...">

When faced with a repeated style attribute, IE combines both of the styles, while Firefox only uses one of them, hence the different behavior. I changed the tag to

<div style="...; ..." ...>

and sure enough, it fixed the problem! The moral of the story is that browsers have more consistent handling of valid HTML than of invalid HTML. So, fix your damn markup already! (Or use HTML Tidy to fix it.)

查看更多
低头抚发
5楼-- · 2018-12-31 18:26

The link has a fairly good example of when and why you may need to escape & to &amp;

https://jsfiddle.net/vh2h7usk/1/

Interestingly, I had to escape the character in order to represent it properly in my answer here. If I were to use the built-in code sample option (from the answer panel), I can just type in &amp; and it appears as it should. But if I were to manually use the <code></code> element, then I have to escape in order to represent it correctly :)

查看更多
看淡一切
6楼-- · 2018-12-31 18:30

It depends on the likelihood of a semicolon ending up near your &, causing it to display something quite different.

For example, when dealing with input from users (say, if you include the user-provided subject of a forum post in your title tags), you never know where they might be putting random semicolons, and it might randomly display strange entities. So always escape in that situation.

For your own static html, sure, you could skip it, but it's so trivial to include proper escaping, that there's no good reason to avoid it.

查看更多
登录 后发表回答