I use DOMDocument
to manipulate html and php 7. The problem is that text shows good on page (cyrillic), but when I go to "See HTML page source", it is not good. It shows like this:
Здесь осн
What might be wrong? <meta>
charset is utf-8. My code:
$dom = new DOMDocument();
if (@$dom->loadHTML(mb_convert_encoding("<div>$body</div>", 'HTML-ENTITIES', 'UTF-8'), LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD)) {
// https://stackoverflow.com/questions/29493678/loadhtml-libxml-html-noimplied-on-an-html-fragment-generates-incorrect-tags
$container = $dom->getElementsByTagName('div')->item(0);
$container = $container->parentNode->removeChild($container);
while ($dom->firstChild)
$dom->removeChild($doc->firstChild);
while ($container->firstChild )
$dom->appendChild($container->firstChild);
$xpath = new DOMXPath($dom);
$headlines = $xpath->query("//h2");
// some code..
return $dom->saveHTML();
}
The problem is with
$dom->saveHTML();
, you need to add the root node as a parameter, like this:The suddenly it renders the page differently, with substitution. If it does not, double check the values of
$dom->encoding
and$dom->substituteEntities
, they should readUTF-8
andTRUE
.