I'm using DOMDocument
and SimpleXMLElement
to create a formatted XML file. While this all works, the resulting file is saved as ASCII, not as UTF-8. I can't find an answer as to how to change that.
The XML is created as so:
$XMLNS = "http://www.sitemaps.org/schemas/sitemap/0.9";
$rootNode = new \SimpleXMLElement("<?xml version='1.0' encoding='UTF-8'?><urlset></urlset>");
$rootNode->addAttribute('xmlns', $XMLNS);
$url = $rootNode->addChild('url');
$url->addChild('loc', "Somewhere over the rainbow");
//Turn it into an indented file needs a DOMDocument...
$dom = dom_import_simplexml($rootNode)->ownerDocument;
$dom->formatOutput = true;
$path = "C:\\temp";
// This saves an ASCII file
$dom->save($path.'/sitemap.xml');
The resulting XML looks like this (which is as it should be I think):
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>Somewhere over the rainbow</loc>
</url>
</urlset>
Unfortunately the file is ASCII encoded and not UTF-8.
How do I fix this?
Edit: Don't use notepad++ to check encoding
I've got it to work now thanks to the accepted answer below. There's one note: I used Notepad++ to open the file and check the encoding. However, when I re-generated the file, Notepad++ would update its tab and for some reason indicate ANSI as the encoding. Closing and reopening the same file in Notepad++ would then again indicate UTF-8 again. This caused me a load of confusion.
I think there are a couple of things going on here. For one, you need:
$dom->encoding = 'utf-8';
But also, I think we should try creating the DOMDocument
manually specifying the proper encoding. So:
<?php
$XMLNS = "http://www.sitemaps.org/schemas/sitemap/0.9";
$rootNode = new \SimpleXMLElement("<?xml version='1.0' encoding='UTF-8'?><urlset></urlset>");
$rootNode->addAttribute('xmlns', $XMLNS);
$url = $rootNode->addChild('url');
$url->addChild('loc', "Somewhere over the rainbow");
// Turn it into an indented file needs a DOMDocument...
$domSxe = dom_import_simplexml($rootNode)->ownerDocument;
// Set DOM encoding to UTF-8.
$domSxe->encoding = 'UTF-8';
$dom = new DOMDocument('1.0', 'UTF-8');
$domSxe = $dom->importNode($domSxe, true);
$domSxe = $dom->appendChild($domSxe);
$path = "C:\\temp";
$dom->formatOutput = true;
$dom->save($path.'/sitemap.xml');
Also ensure that any elements or CData you're adding are actually UTF-8 (see utf8_encode()
).
Using the example above, this works for me:
php > var_dump($utf8);
string(11) "ᙀȾᎵ⁸"
php > $XMLNS = "http://www.sitemaps.org/schemas/sitemap/0.9";
php > $rootNode = new \SimpleXMLElement("<?xml version='1.0' encoding='UTF-8'?><urlset></urlset>");
php > $rootNode->addAttribute('xmlns', $XMLNS);
php > $url = $rootNode->addChild('url');
php > $url->addChild('loc', "Somewhere over the rainbow $utf8");
php > $domSxe = dom_import_simplexml($rootNode);
php > $domSxe->encoding = 'UTF-8';
php > $dom = new DOMDocument('1.0', 'UTF-8');
php > $domSxe = $dom->importNode($domSxe, true);
php > $domSxe = $dom->appendChild($domSxe);
php > $dom->save('./sitemap.xml');
$ cat ./sitemap.xml
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"><url><loc>Somewhere over the rainbow ᙀȾᎵ⁸</loc></url></urlset>
Your data must not be in UTF-8. You can convert it like so:
utf8_encode($yourData);
Or, maybe:
iconv('ISO-8859-1', 'UTF-8', $yourData)