Convert parsed text, with php, to utf-8

2019-09-06 11:02发布

问题:

In addition to my previous question about parsing images and text from complex xml, only problem about that now is that i don't get the right encoding. Text is in greek, the xml file has utf-8 encoding. This is the code to parse xml:

$xml = simplexml_load_file('myfile.xml');

$descriptions = $xml->xpath('//item/description');

foreach ( $descriptions as $description_node ) {

    $description_dom = new DOMDocument();
    $description_dom->loadHTML( (string)$description_node );

    $description_sxml = simplexml_import_dom( $description_dom );

    $imgs = $description_sxml->xpath('//img');
    $text = $description_sxml->xpath('//div');

    foreach($imgs as $image){

    echo (string)$image['src'];     
       }

    foreach($text as $t){

        echo (string)$t;
       }
    }

If i echo $description_node,text looks fine, but after i get $description_dom with simplexml_import_domit looks like this: Ïε ιÏÎ»Î±Î¼Î¹ÎºÎ­Ï ÎºÎ¿Î¹Î½ÏÏηÏεÏ.Using mb_convert_encoding turns it to: ýÃÂñù" ÃÂ. What am i doing wrong?

回答1:

Solution: after $description_dom = new DOMDocument(); , i placed this code.

$description_html = mb_convert_encoding($description_node, 'HTML-ENTITIES', "UTF-8");

Simply converts html entities to UTF-8. Instead of

$description_dom->loadHTML( (string)$description_node );

now i load the converted html

$description_dom->loadHTML( (string)$description_html );


回答2:

Add this to the head of your HTML page where you want the text to be displayed :

<meta http-equiv='Content-Type' content='text/html; charset=utf-8'>

This should render the characters properly.



回答3:

Do not convert anything.. just print it with proper declaration

header("Content-Type: text/plain; charset=utf-8");

This is all you need to do. Do it at the top of your file.