This is my code:
$oDom = new DOMDocument();
$oDom->loadHTML("èàéìòù");
echo $oDom->saveHTML();
This is the output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>èà éìòù</p></body></html>
I want this output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><body><p>èàéìòù</p></body></html>
I've tried with ...
$oDom = new DomDocument('4.0', 'UTF-8');
or with 1.0 and other stuffs but nothing.
Another thing ...
There is a way to obtain the same untouched HTML?
For example with this html in input <p>hello!</p>
obtain the same output <p>hello!</p>
using DOMDocument only for parsing the DOM and to do some substitutions inside the tags.
This way:
Solution:
The
saveHTML()
method works differently specifying a node. You can use the main node ($oDom->documentElement
) adding the desired!DOCTYPE
manually. Another important thing isutf8_decode()
. All the attributes and the other methods of theDOMDocument
class, in my case, don't produce the desired result.Try to set the encoding type after you have loaded the HTML.
Other way
Looks like you just need to set substituteEntities when you create the DOMDocument object.
The issue appears to be known, according to the user comments on the manual page at php.net. Solutions suggested there include putting
in the document before you put any strings with non-ASCII chars in.
Another hack suggests putting
as the first text in the document and then removing it at the end.
Nasty stuff. Smells like a bug to me.
I don't know why the marked answer didn't work for my problem. But this one did.
ref: https://www.php.net/manual/en/class.domdocument.php