Encoding difficulties

2019-09-11 07:28发布

问题:

I'm having some encoding problems with a code I'm working on. An encrypted string is received which is decoded with ISO-8859-1. This string is then put into a DB which has UTF-8 encoding. When this string is retrieved it's still ISO-8859-1, and there's no problems. The issue is that I also need to be able to retrieve this string as UTF-8, but I haven't been successfull in this.

I've tried to convert the string from ISO to UTF-8 when retrieved from the DB using this method:

private String convertIsoToUtf8(String isoLatin) {
    try {
        return new String(isoLatin.getBytes("ISO_8859_1"), "UTF_8");
    } catch (UnsupportedEncodingException e) {
        return isoLatin;
    }
}

Unfortunately, the special characters are just displayed as question-marks in this case.

Original string: Test æøå Example output after retriving from DB and converting to UTF-8: Test ???

Update: After reading the link provided in the comment, I managed to get it right. Since the DB is already UTF-8 encoded, all I needed to do was this:

return new String(isoLatin.getBytes("UTF-8"));

回答1:

When you already have a String-object it is usually too late to correct any encoding-issues since some information may already have been lost - think of characters that can't be mapped one-to-one onto to java's internal UTF-16 representation.

The correct place to handle character-ecoding is the moment you get your Strings: when reading input from a file (set the correct encoding on your InputStreamReader), when converting the byte[] you got from decryption, when reading from the database (this should be handeled by your JDBC-driver) etc.

Also take care to correctly handle the encoding when doing the reverse. While it might seem to work OK most of the time when you use the default-encoding you might run into issues sooner or later that become difficult to impossible to resolve (as you do now).

P.S.: also keep in mind what tool you are using to display your output: some consoles won't display UTF-16 or UTF-8, check the encoding-settings of the editor you use to view your files etc. Sometimes your output might be correct and just can't be displayed correctly.