可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

I'm reading an HTML document that contains UTF-8 chars but when I access the innerHTML of the document, all the "bad" chars show up as 0xfffd. I've tried it in all the major browsers and it behaves the same way. When I alert() the innerHTML it shows those chars as a "diamond with a ? mark".

Surprisingly the following works perfectly, correctly displaying the UTF-8 char in the alert box, so its not alert() is malfunctioning.

alert("Doppelg\u00e4nger!");

Why can't I access the UTF-8 chars using innerHTML? Or is there another way to access them in JavaScript.

回答1:

First, check if the document header contains.

<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">

You can also read out the meta-tags with javascript:

var metaTags = document.getElementsByTagName("META");

If it does, this is the explanation of the behavior. You can try changing utf-8 to ISO-8859-1:

<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">

Better is to htmlEncode all extended characters in your HTML. Like this:

function encodeHTML(str){
 var aStr = str.split(''),
     i = aStr.length,
     aRet = [];

   while (--i) {
    var iC = aStr[i].charCodeAt();
    if (iC < 65 || iC > 127 || (iC>90 && iC<97)) {
      aRet.push('&#'+iC+';');
    } else {
      aRet.push(aStr[i]);
    }
  }
 return aRet.reverse().join('');
}

Mind you, this function will encode everything that is not [a-zA-Z]. This function will encode Doppelgänger in Doppelgänger for example.