I have this XML along with an embedded DTD:
<?xml version="1.0" ?>
<!DOCTYPE customers [
<!ELEMENT customers (name,age,roll,sex)>
<!ELEMENT name (#CDATA)>
<!ELEMENT age (#CDATA)>
<!ELEMENT roll (#CDATA)>
<!ELEMENT sex (#CDATA)>
]>
<customers>
<name>XYZ</name>
<age>19</age>
<roll>23</roll>
<sex>M</sex>
</customers>
When i try to validate the XML, its showing me an error. But if i change the #CDATA to #PCDATA then the validation is successful.
Question 1) I don't have a proper explanation why is this happening. The only difference between the two is: #CDATA will not parse whereas #PCDATA will parse. In such a case. both the validations should succeed right? Please explain where am i wrong since the output of this validation is going against my concept.
Regards,
There is no such thing as #CDATA
available for use in XML DTDs. It is an unknown keyword. That's why you get an error.
CDATA
is a keyword that is used when declaring attributes in a DTD. You cannot declare an element to be of type CDATA
(or #CDATA
).
The string CDATA
is also found in CDATA sections (<![CDATA[ ... ]]>
), which are entirely different things. They can be used in an XML document to escape characters (such as &
) that would otherwise be interpreted as markup. CDATA sections are not declared in the DTD; they are simply used when needed.
If you have markup such as <name>L&T</name>
(that is not enclosed in a CDATA section), then it will be rejected by the parser. It does not matter how the name
element is declared in the DTD.
A CDATA section starts with <![CDATA[
and ends with ]]>
: <sex><![CDATA[M]]></sex>
.
This is intended to include anything your XML parser should just skip without interpreting: significant line breaks, special characters, XML markup in a role of a text string. The only thing it cannot directly contain is ]]>
.
PCDATA is parsed CDATA, which corresponds to usual plain text.
So PCDATA is what you really need.