Do I really need to encode '&' as '&am

2018-12-31 17:29发布

I'm using an '&' symbol with HTML5 and UTF-8 in my site's <title>. Google shows the ampersand fine on its SERPs, as do all the browsers in their titles.

http://validator.w3.org is giving me this:

& did not start a character reference. (& probably should have been escaped as &amp;.)

Do I really need to do &amp;?

I'm not fussed about my pages validating for the sake of validating, but I'm curious to hear people's opinions on this and if it's important and why.

17条回答
忆尘夕之涩
2楼-- · 2018-12-31 18:16

Validation aside, the fact remains that encoding certain characters is important to an HTML document so that it can render properly and safely as a web page.

Encoding & as &amp; under all circumstances, for me, is an easier rule to live by, reducing the likelihood of errors and failures.

Compare the following: which is easier? which is easier to bugger up?

Methodology 1

  1. Write some content which includes ampersand characters.
  2. Encode them all.

Methodology 2

(with a grain of salt, please ;) )

  1. Write some content which includes a ampersand characters.
  2. On a case-by-case basis, look at each ampersand. Determine if:
    • It is isolated, and as such unambiguously an ampersand. eg. volt & amp
       > In that case don't bother encoding it.
    • It is not isolated, but you feel it is nonetheless unambiguous, as the resulting entity does not exist and will never exist since the entity list could never evolve. eg amp&volt
       > In that case don't bother encoding it.
    • It is not isolated, and ambiguous. eg. volt&amp
       > Encode it.

??

查看更多
永恒的永恒
3楼-- · 2018-12-31 18:16

if & is used in html then you should escape it

If & is used in javascript strings e.g. an alert('This & that'); or document.href you don't need to use it.

If you're using document.write then you should use it e.g. document.write(<p>this &amp; that</p>)

查看更多
一个人的天荒地老
4楼-- · 2018-12-31 18:17

Yes. Just as the error said, in HTML, attributes are #PCDATA meaning they're parsed. This means you can use character entities in the attributes. Using & by itself is wrong and if not for lenient browsers and the fact that this is HTML not XHTML, would break the parsing. Just escape it as &amp; and everything would be fine.

HTML5 allows you to leave it unescaped, but only when the data that follows does not look like a valid character reference. However, it's better just to escape all instances of this symbol than worry about which ones should be and which ones don't need to be.

Keep this point in mind; if you're not escaping & to &amp;, it's bad enough for data that you create (where the code could very well be invalid), you might also not be escaping tag delimiters, which is a huge problem for user-submitted data, which could very well lead to HTML and script injection, cookie stealing and other exploits.

Please just escape your code. It will save you a lot of trouble in the future.

查看更多
浅入江南
5楼-- · 2018-12-31 18:17

I’ve researched this thoroughly and wrote about my findings here: http://mathiasbynens.be/notes/ambiguous-ampersands

I’ve also created an online tool that you can use to check your markup for ambiguous ampersands or character references that don’t end with a semicolon, both of which are invalid. (No HTML validator currently does this correctly.)

http://i.imgur.com/cLssU.png

查看更多
浪荡孟婆
6楼-- · 2018-12-31 18:22

I was checking why Image URL's need escaping, hence tried it in https://validator.w3.org. The explanation is pretty nice. It highlights that even URL's need to be escaped. [PS:I guess it will unescaped when its consumed since URL's need &. Can anyone clarify?]

<img alt="" src="foo?bar=qut&qux=fop" />

An entity reference was found in the document, but there is no reference by that name defined. Often this is caused by misspelling the reference name, unencoded ampersands, or by leaving off the trailing semicolon (;). The most common cause of this error is unencoded ampersands in URLs as described by the WDG in "Ampersands in URLs". Entity references start with an ampersand (&) and end with a semicolon (;). If you want to use a literal ampersand in your document you must encode it as "&" (even inside URLs!). Be careful to end entity references with a semicolon or your entity reference may get interpreted in connection with the following text. Also keep in mind that named entity references are case-sensitive; &Aelig; and æ are different characters. If this error appears in some markup generated by PHP's session handling code, this article has explanations and solutions to your problem.

查看更多
不再属于我。
7楼-- · 2018-12-31 18:22

If you're really talking about the static text

<title>Foo & Bar</title>

stored in some file on the hard disk and served directly by a server, then yes: it probably doesn't need to be escaped.

However, since there is very little HTML content nowadays that's completely static, I'll add the following disclaimer that assumes that the HTML content is generated from some other source (database content, user input, web service call result, legacy API result, ...):

If you don't escape a simple &, then chances are you also don't escape a &amp; or a &nbsp; or <b> or <script src="http://attacker.com/evil.js"> or any other invalid text. That would mean that you are at best displaying your content wrongly and more likely are suspectible to XSS attacks.

In other words: when you're already checking and escaping the other more problematic cases, then there's almost no reason to leave the not-totally-broken-but-still-somewhat-fishy standalone-& unescaped.

查看更多
登录 后发表回答