According to the W3C XML Recommendation, start tag-names have the definition:
STag ::= '<' Name (S Attribute)* S? '>'
..where Name
is:
Name ::= NameStartChar (NameChar)*
NameStartChar ::= ":" | [A-Z] | ...
..(n.b., states that a colon can appear as the first character) suggesting the following is a valid XML document:
<?xml version="1.0" ?><:doc></:doc>
..but any parser I try this in shows the colon as a formatting error.
Also, under Appendices B (though now a depreciated part of the document) it explicitly states:
Characters ':' and '_' are allowed as name-start characters.
..and:
<?xml version="1.0" ?><_doc></_doc>
..is accepted by the XML parsers I've tried.
So, is a colon a valid first character in a tag-name, and the parsers I'm using are wrong, or am I reading the specification wrong?
Yes, at the base XML level, colon (:
) is allowed as a name-start character. The BNF rules you cite clearly specify this.
However, the W3C XML Recommendation is clear that colons should not be used except for namespaces purposes:
Note:
The Namespaces in XML Recommendation [XML Names] assigns a
meaning to names containing colon characters. Therefore, authors
should not use the colon in XML names except for namespace purposes,
but XML processors must accept the colon as a name character.
And the XML Namespace BNF rules for tags are based on QName, which allow for colon in a name only as a separated between Prefix
and LocalPart
:
QName ::= PrefixedName | UnprefixedName
PrefixedName ::= Prefix ':' LocalPart
UnprefixedName ::= LocalPart
Prefix ::= NCName
LocalPart ::= NCName
NCName ::= Name - (Char* ':' Char*) /* An XML Name, minus the ":" */
One might ask why colon wasn't disallowed in NameStartChar
from the beginning. If we're lucky, C. M. Sperberg-McQueen may offer an authoritative explanation. However, I suspect it's a matter of an evolving notion of how namespaces were expected to be designed.
The first published working draft in 1996 of the W3C XML Recommendation had a definition of STag
which did not allow colon:
STag ::= '<' Name (S Attribute)* S? '>'
Name ::= (Letter | '-') (Letter | Digit | '-' | '.')*
By 1998, colons were allowed in Name
,
Name ::= (Letter | '_' | ':') (NameChar)*
and an earlier form of the admonition about colon use read:
Note: The colon character within XML names is reserved for experimentation with name spaces. Its meaning is expected to be
standardized at some future point, at which point those documents
using the colon for experimental purposes may need to be updated.
(There is no guarantee that any name-space mechanism adopted for XML
will in fact use the colon as a name-space delimiter.) In practice,
this means that authors should not use the colon in XML names except
as part of name-space experiments, but that XML processors should
accept the colon as a name character.
The need was anticipated but the precise form perhaps was not yet known when colon was first introduced to tag names.
They are allowed in non-namespace-aware XML but they are not allowed in namespace-aware XML. More specifically, the base XML recommendation allows them but the Namespaces recommendation prohibits them. Very few people nowadays use non-namespace-aware XML (and I'm not sure what parsers support it) so it's best to assume they aren't allowed.