XML validation against DTD

2019-07-23 12:03发布

问题:

I have this XML along with an embedded DTD:

<?xml version="1.0" ?>
<!DOCTYPE customers [
<!ELEMENT customers (name,age,roll,sex)>
<!ELEMENT name (#CDATA)>
<!ELEMENT age (#CDATA)>
<!ELEMENT roll (#CDATA)>
<!ELEMENT sex (#CDATA)>
]>
<customers>
<name>XYZ</name>
<age>19</age>
<roll>23</roll>
<sex>M</sex>
</customers>

When i try to validate the XML, its showing me an error. But if i change the #CDATA to #PCDATA then the validation is successful.

Question 1) I don't have a proper explanation why is this happening. The only difference between the two is: #CDATA will not parse whereas #PCDATA will parse. In such a case. both the validations should succeed right? Please explain where am i wrong since the output of this validation is going against my concept.

Regards,

回答1:

There is no such thing as #CDATA available for use in XML DTDs. It is an unknown keyword. That's why you get an error.

CDATA is a keyword that is used when declaring attributes in a DTD. You cannot declare an element to be of type CDATA (or #CDATA).

The string CDATA is also found in CDATA sections (<![CDATA[ ... ]]>), which are entirely different things. They can be used in an XML document to escape characters (such as &) that would otherwise be interpreted as markup. CDATA sections are not declared in the DTD; they are simply used when needed.

If you have markup such as <name>L&T</name> (that is not enclosed in a CDATA section), then it will be rejected by the parser. It does not matter how the name element is declared in the DTD.



回答2:

A CDATA section starts with <![CDATA[ and ends with ]]>: <sex><![CDATA[M]]></sex>. This is intended to include anything your XML parser should just skip without interpreting: significant line breaks, special characters, XML markup in a role of a text string. The only thing it cannot directly contain is ]]>.

PCDATA is parsed CDATA, which corresponds to usual plain text.

So PCDATA is what you really need.