Numerical character reference entities… Nomenclatu

2019-08-19 12:46发布

问题:

It used to be so simple. Or so I thought.

  • nbsp is an entity
  •   is, therefore, an entity reference (a reference to an entity)
  •   is a character reference (a reference to a numerical character value)

But these days, I read so many documents, even official ones, where those words are all mangled together; you have character entities, named character references, numerical entities, reference entities, and so on.

So what is it really? How are these things really called? Who can I trust to have it right these days?

Edit: the resolution so far is that   and   have names ending in "reference" (although what's before the "reference" varies between HTML4, HTML5 and XML). If you call them something ending in "entity", you're most likely incorrect.

回答1:

You are correct except that nbsp is not an entity but an entity name. The entity is the thing that the entity reference refers to, in this case the no-break space character.

The entity reference can also be called named entity reference (since SGML in general allows other types of entity reference, too). Similarly, the character reference can be called numeric character reference (to distinguish it from certain SGML concepts that never applied in HTML).

This is the SGML (ISO 8879) terminology that HTML specifications nominally adhere to, be their formal references to the SGML standard, up to and including HTML 4.01.

(Even HTML specifications use SGML terms sloppily, though. And in fact, HTML was never implemented as SGML-based, though some features of SGML are reflected in implementations.)

XHTML is based on XML, which is a simplification of SGML and formally defined as standalone. XML uses the terms entity reference and character reference, like SGML, but the longer names don’t apply.

HTML5 is something different: designed to be independent of SGML and XML. It also introduces its own terminology.



回答2:

I am basing this answer on the HTML5 specification, which I usually treat as trustworthy, although it is a working draft so subject to change.

nbsp is a "character reference name" (but the spec also calls it an "entity name")

  is a "named character reference"

  is a "decimal numerical character reference"

There is another option too:

† is a "hexadecimal numeric character reference"