How to replace � in a string

2020-01-26 06:28发布

I have a string that contains a character � I haven't been able to replace it correctly.

String.replace("�", "");

doesn't work, does anyone know how to remove/replace the � in the string??

10条回答
再贱就再见
2楼-- · 2020-01-26 06:40

That's the Unicode Replacement Character, \uFFFD. (info)

Something like this should work:

String strImport = "For some reason my �double quotes� were lost.";
strImport = strImport.replaceAll("\uFFFD", "\"");
查看更多
何必那么认真
3楼-- · 2020-01-26 06:40

Change the Encoding to UTF-8 while parsing .This will remove the special characters

查看更多
疯言疯语
4楼-- · 2020-01-26 06:41

As others have said, you posted 3 characters instead of one. I suggest you run this little snippet of code to see what's actually in your string:

public static void dumpString(String text)
{
    for (int i=0; i < text.length(); i++)
    {
        System.out.println("U+" + Integer.toString(text.charAt(i), 16) 
                           + " " + text.charAt(i));
    }
}

If you post the results of that, it'll be easier to work out what's going on. (I haven't bothered padding the string - we can do that by inspection...)

查看更多
何必那么认真
5楼-- · 2020-01-26 06:44

dissect the URL code and unicode error. this symbol came to me as well on google translate in the armenian text and sometimes the broken burmese.

查看更多
乱世女痞
6楼-- · 2020-01-26 06:50

You are asking to replace the character "�" but for me that is coming through as three characters 'ï', '¿' and '½'. This might be your problem... If you are using Java prior to Java 1.5 then you only get the UCS-2 characters, that is only the first 65K UTF-8 characters. Based on other comments, it is most likely that the character that you are looking for is '�', that is the Unicode replacement character. This is the character that is "used to replace an incoming character whose value is unknown or unrepresentable in Unicode".

Actually, looking at the comment from Kathy, the other issue that you might be having is that javac is not interpreting your .java file as UTF-8, assuming that you are writing it in UTF-8. Try using:

javac -encoding UTF-8 xx.java

Or, modify your source code to do:

String.replaceAll("\uFFFD", "");
查看更多
女痞
7楼-- · 2020-01-26 06:58

Character issues like this are difficult to diagnose because information is easily lost through misinterpretation of characters via application bugs, misconfiguration, cut'n'paste, etc.

As I (and apparently others) see it, you've pasted three characters:

codepoint   glyph   escaped    windows-1252    info
=======================================================================
U+00ef      ï       \u00ef     ef,             LATIN_1_SUPPLEMENT, LOWERCASE_LETTER
U+00bf      ¿       \u00bf     bf,             LATIN_1_SUPPLEMENT, OTHER_PUNCTUATION
U+00bd      ½       \u00bd     bd,             LATIN_1_SUPPLEMENT, OTHER_NUMBER

To identify the character, download and run the program from this page. Paste your character into the text field and select the glyph mode; paste the report into your question. It'll help people identify the problematic character.

查看更多
登录 后发表回答