Why does XML display error on certain special characters and some are ok?
For instance, below will create error,
<?xml version="1.0" standalone="yes"?>
<Customers>
<Customer>
<Name>Löic</Name>
</Customer>
</Customers>
but this is ok,
<?xml version="1.0" standalone="yes"?>
<Customers>
<Customer>
<Name>&</Name>
</Customer>
</Customers>
I convert the special character through php - htmlentities('Löic',ENT_QUOTES)
by the way.
How can I get around this?
Thanks.
EDIT:
I found that it works fine if I use numeric character such as Lóic
now I have to find how to use php to convert special characters into numeric characters!
There are five entities defined in the XML specification — &
, <
, >
, '
and "
There are lots of entities defined in the HTML DTD.
You can't use the ones from HTML in generic XML.
You could use numeric references, but you would probably be better off just getting your character encodings straight (which basically boils down to:
- Set your editor to save the data in UTF-8
- If you process the data with a programming language, make sure it is UTF-8 aware
- If you store the data in a database, make sure it is configured for UTF-8
- When you serve up your document, make sure the HTTP headers specify that it is UTF-8 (in the case of XML, UTF-8 is the default, so not specifying anything is almost as good)
)
Because it is not an built-in entity, it is instead an external entity that needs declaration in DTD.