
Encoding issues with XMLWriter (PHP)

2019-02-19 08:23发布


Take this simple PHP code:

$xmlWriter = new XMLWriter();
$xmlWriter->startDocument('1.0', 'utf-8');

$xmlWriter->writeElement('test', $data);


The XMLWriter class has a nice feature: it will convert any data you give to it to the output encoding. For example here it will convert $data to UTF-8 because I passed 'utf-8' in the startDocument function.

The problem is that in my case the content of $data comes from a database whose output format is UTF-8 and is therefore already in UTF-8. The XMLWriter probably thinks the data is in ISO-8859-1 and converts it again to UTF-8, and I get weird symbols where I should get accents.

Currently I'm using utf8_decode around each string coming from the database, which means I'm converting from UTF-8 to ISO-8859-1, and then XMLWriter turns it back into UTF-8.

This works but is not clean:

$xmlWriter->writeElement('test', utf8_decode($data));

Is there a cleaner solution ?

EDIT: showing a full example

$xmlWriter = new XMLWriter();
$xmlWriter->startDocument('1.0', 'utf-8');

$database = new PDO('mysql:host=localhost;dbname=xxxxx', 'xxxxx', 'xxxxx');
$database->exec('SET CHARACTER SET UTF8');
$database->exec('SET NAMES UTF8');
foreach ($database->query('SELECT name FROM usersList') as $user)
   $xmlWriter->writeElement('user', $user[0]);   // if the user's name is 'hervé' in the database, it will print 'hervé' instead



I'm not sure where you got the idea that XMLWriter converts encodings. It doesn't. You must supply it with utf-8. It can output different encodings, but input strings must be utf-8.

One of two things may be going on here:

  1. Whatever you are using to view your output document is interpreting the string as win-1252. If you are viewing your output in a browser, you may need to set the content-type header like so: header('Content-Type: application/xml; charset=UTF-8');
  2. You stored your data in your database incorrectly, and your "é" is actually two unicode characters "é". Fixing this is difficult.