可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效，请关闭广告屏蔽插件后再试):

问题:

When I parse the XML, it contains abnormal hex characters. So I tried to replace it with empty space. But it doesn't work at all.

Original character : �

hex code : (253, 255)

code :

xmlData = String.replace(String.fromCharCode(253,255)," ");

retrun xmlData;

I'd like to remove "ýÿ" characters from description. Is there anyone who have a trouble with replacing hex character to empty space?

Based on the answers, I've modified the code as follows:

testData = String.fromCharCode(253,255);
xmlData = xmlData.replace(String.fromCharCode(253,255), " "); 
console.log(xmlData);

but it still shows '�' on the screen..

Do you know why this still happens?

回答1:

The character code is actually 255 * 256 + 253 = 65533, so you would get something like this:

xmlData = xmlData.replace(String.fromCharCode(65533)," ");

String String.fromCharCode(253,255) is of two characters.

回答2:

You should call replace() on a string instance not on String:

var testData = String.fromCharCode(253,255);
var xmlData = testData.replace(String.fromCharCode(253,255), " ");
alert(xmlData);

Working example: http://jsfiddle.net/StURS/2/

回答3:

Just had this problem with a messed up SQL-dump that contained both valid UTF-8 codes and invalid forcing a more manual conversion. As the above examples don't address replacement and finding better matches I figured that I put my two cents in here for those that are struggling with similar encoding problems. The following code:

parses my sql-dump
splits according to queries
finds character codes outside the 256 scope
outputs the codes and the string with context where the code appears
replaces the Swedish ÅÄÖ with correct codes using regular expressions
outputs the replaced string for control

"use strict";

const readline = require("readline");
const fs = require("fs");

var fn = "my_problematic_sql_dump.sql";
var lines = fs.readFileSync(fn).toString().split(/;\n/);

const Aring = new RegExp(String.fromCharCode(65533) +
    "\\" + String.fromCharCode(46) + "{1,3}", 'g');
const Auml = new RegExp(String.fromCharCode(65533) +
    String.fromCharCode(44) + "{1,3}", 'g');
const Ouml = new RegExp(String.fromCharCode(65533) +
    String.fromCharCode(45) + "{1,3}", 'g');

for (let i in lines){
    let l = lines[i];
    for (let ii = 0; ii < l.length; ii++){
        if (l.charCodeAt(ii) > 256){
            console.log("\n Invalid code at line " + i + ":")
            console.log("Code: ", l.charCodeAt(ii), l.charCodeAt(ii + 1),
                l.charCodeAt(ii + 2), l.charCodeAt(ii + 3))

            let core_str = l.substring(ii, ii + 20)
            console.log("String: ", core_str)

            core_str = core_str.replace(/[\r\n]/g, "")
            .replace(Ouml, "Ö")
            .replace(Auml, "Ä")
            .replace(Aring, "Å")
            console.log("After replacements: ", core_str)
        }
    }
}

The resulting output will look something like this:

 Invalid code at line 18:
Code:  65533 45 82 65533
String:  �-R�,,LDRALEDIGT', N
After replacements:  ÖRÄLDRALEDIGT', N

 Invalid code at line 18:
Code:  65533 44 44 76
String:  �,,LDRALEDIGT', NULL
After replacements:  ÄLDRALEDIGT', NULL

 Invalid code at line 19:
Code:  65533 46 46 46
String:  �...ker med fam till
After replacements:  Åker med fam till

A few things that I found worth noting:

The 65533 is sometimes followed by a varying number of regular characters that decide the actual character hence the {1,3}
TheAring contains a ., i.e. matches anything and needs the additional \\