I read Extensible Markup Language (XML) 1.0 (Fifth Edition) W3C Recommendation 26 November 2008
3.2 Element Type Declarations has:
An element type declaration takes the form:
Element Type Declaration
elementdecl ::= <!ELEMENT Name contentspec >
contentspec ::= 'EMPTY' | 'ANY' | Mixed | children
And in 3.2.1 Element Content has:
Element-content Models
children ::= (choice | seq) ('?' | '*' | '+')?
cp ::= (Name | choice | seq) ('?' | '*' | '+')?
choice ::= '(' S? cp ( S? '|' S? cp )+ S? ')'
seq ::= '(' S? cp ( S? ',' S? cp )* S? ')'
After it I had question. Which different between 'contentspec' and 'content model'.
Maybe
contentspec is ANY, PCDATA, Mixed, children.
And only children has 'content model': (elemName1 | elemName2, elemName3, elemET).
(Name | , '?' '*' '+'), sequence, choice - all that is 'content model'. Right?
Does Mixed have 'content model'?
In tutorial write often:
<!ELEMENT Name content_model >
You got it.
contentspec is ANY, PCDATA, Mixed, children.
And only children has 'content model'
It describes all the types of content an element can have:
EMPTY
tag -- no content
ANY
any other elements defined in the DTD -- kind of free-form
- Mixed content, which is decribed in 3.2.2 of the XML recommendation
- children, which is described in 3.1.1 of the XML recommendation
Content model allows to use sequence, choice, parenthis and so on, e.g. ((a|b)|(c+, d?, e*))?
, and only reference others elements -- #PCDATA
is not allowed here.
Mixed content is a bit particular since this model can only use choice, and therefore is distinct to content model. A mixed content is either #PCDATA
or something like (#PCDATA | a | b | c)*
. In the latter case you need to open an parenthesis (
, #PCDATA
must come first, then you specify the allowed elements separated by the choice |
and finish by closing the parenthesis )
, with the zero or more occurence specifier *
.
The consequence of all this is provided in 3.2.2 Mixed content :
In this case, the types of the child elements may be constrained, but not their order or their number of occurrences
In particular, it won't be possible to define an element:
- that can contain either text (
#PCDATA
) or a sequence of elements: e.g. (#PCDATA
) | (a, b, c)) is not valid
- that must start with an element, followed by text, then other element: e.g.
(a, #PCDATA;, b, c)
is also not valid
You can not also be ensured that your element in your XML instance will have any content at all (it can remain empty).