Windows-1252 encoding - incorrect characters displ

2019-04-12 08:41发布

问题:

I have a buffer with chars encoded in Windows-1252. However when I create a new String with appropriate encoding, instead of expected result I've get quite often interrogation marks, ex.

byte[] tmps = new byte[] {(byte) 0xfb};
System.out.println (new String (tmps,0,1,"Windows-1252" ));

As result the system should display "u" char with "^" above it. Instead it displays "?".

Any idea?

回答1:

First of all Windows-1252 is a supported encoding:

  • If it wasn't you'd get an UnsupportedEncodingException in new String (...,"Windows-1252"). (That's what the javadoc says!)

  • The Oracle Java documentation say Windows-1252 is in the "Basic Encoding Set" - http://docs.oracle.com/javase/7/docs/technotes/guides/intl/encoding.doc.html, http://docs.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html, etcetera.

I think that the most likely problem here is on the output side. Specifically, Java may think that your locale's default charset is ASCII or something that doesn't support that codepoint.

One way to eliminate Windows-1252 as the cause of the problem is to write the equivalent string using a Unicode escape; e.g.

    System.out.println("\u00fb");


回答2:

I've already found this.

Menu Run/Run configurations/ next Java Application and your own app name/tab common/ next encoding set to UTF-8

And since now both windows 1250 and 1252 chars seems to be displayed ok.



标签: java encoding