Symfony2 Crawler - Use UTF-8 with XPATH

2020-07-23 08:45发布

问题:

I am using Symfony2 Crawler - Bundle for using XPath. Everything works fine, except the encoding.

I would like to use UTF-8 encoding and the Crawler is somehow not using it. I noticed that because th   are converted to  , which is a known issue: UTF-8 Encoding Issue

My question is: How could I force the Symfony Crawler to use UTF-8 Encoding?

Here is the code I am using:

$dom_input = new \DOMDocument("1.0","UTF-8");
$dom_input->encoding = "UTF-8";
$dom_input->formatOutput = true;

$dom_input->loadHTMLFile($myFile);

$crawler = new Crawler($dom_input); 
$paragraphs = $crawler->filterXPath('descendant-or-self::p');

And now, when I am doing

foreach($paragraphs as $paragraph) {
    var_dump($paragraph->nodeValue);
}

As soon as I have a   in my paragraph, I am getting  .

Thank you very much in advance.

回答1:

Thanks to @halfer, I found a workaround:

Instead of using

$crawler = new Crawler($dom_input);

I used:

$crawler = new Crawler();
$crawler->addHtmlContent(utf8_decode($dom_input->saveXML()));