Wrong File Encoding in JVM after Linux Update

2019-05-28 02:05发布

问题:

After updating linux and java (1.6.0.13->1.6.0.45), Java processes use different file encoding (System Property file.encoding)

New OS Version. Unfortunately I don't know the previous version anymore. But I can tell, that the update got wrong. My Collegue first updated using the x32 OS Version and then we reinstalled x64 Version.

>uname -a
Linux <hostname> 2.6.31.5-0.1-desktop #1 SMP PREEMPT 2009-10-26 15:49:03 +0100 x86_64 x86_64 x86_64 GNU/Linux

Locale Settings

>locale
LANG=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_NUMERIC="en_US.ISO8859-1"
LC_TIME="en_US.ISO8859-1"
LC_COLLATE="en_US.ISO8859-1"
LC_MONETARY="en_US.ISO8859-1"
LC_MESSAGES="en_US.ISO8859-1"
LC_PAPER="en_US.ISO8859-1"
LC_NAME="en_US.ISO8859-1"
LC_ADDRESS="en_US.ISO8859-1"
LC_TELEPHONE="en_US.ISO8859-1"
LC_MEASUREMENT="en_US.ISO8859-1"
LC_IDENTIFICATION="en_US.ISO8859-1"
LC_ALL=

test program

public class Test
{
  public static void main(String[] args)
  {
    System.out.println(System.getProperty("file.encoding"));
  }
}

If I start this test program it returns ANSI_X3.4-1968. On other machines with same locale settings it returns ISO8859-1. Even if i start with explicit environment variable it remains unchanged. The only working solution is to use the -Dfile.encoding option. But I don't want to adjust all scripts that use java (tomcat, maven, ant, hudson....). I want to restore the old behaviour, that the file encoding in Java programms, was retrieved from the system locale definition.

>java Test
ANSI_X3.4-1968

>LANG=de_DE.ISO8859-1 java Test
ANSI_X3.4-1968

>java -Dfile.encoding=ISO8859-1 Test
ISO8859-1

At least c programs get the correct encoding and do not use ANSI_X3.4-1968

>idn --debug  --quiet "a.de"
Charset `ISO-8859-1'.
....

Does anybody know, if there is any jvm specific setting, that might got lost during OS or java update.

Any help appreciated.

回答1:

thanks to icza. I googled a little for JAVA_OPTS, and found, that i should use JAVA_TOOL_OPTIONS instead. see How do I use the JAVA_OPTS environment variable?

or _JAVA_OPTIONS: Running java with JAVA_OPTS env variable

both are working just fine, for runtime and compiler

>export JAVA_TOOL_OPTIONS=-Dfile.encoding=ISO8859-1
>java Test
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=ISO8859-1
ISO8859-1

>javac Test.java
Picked up JAVA_TOOL_OPTIONS: -Dfile.encoding=ISO8859-1

>export _JAVA_OPTIONS=-Dfile.encoding=ISO8859-1
>java Test
Picked up _JAVA_OPTIONS: -Dfile.encoding=ISO8859-1
ISO8859-1

>javac Test.java
Picked up _JAVA_OPTIONS: -Dfile.encoding=ISO8859-1


回答2:

Just hit something similar (on Debian). It was caused by the default LANG/LC settings being for a locale not configured in /etc/locale.gen.

To fix, I uncommented the appropriate line from /etc/locale.gen and ran sudo locale-gen.

I'm surprised that Java doesn't give any warning about this. Perl, for example, makes a loud noise to tell you something's broken:

$ LANG=pl_PL.UTF-8 perl -e ''                
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
    LANGUAGE = "en_GB:en",
    LC_ALL = (unset),
    LANG = "pl_PL.UTF-8"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Also, to explain some of the other behaviour: ANSI_X3.4-1968 is just an official (and somewhat opaque) way of saying "ASCII", and "ISO-8859.1" is the "usual" 8-bit superset of ASCII which is known by various names including "Western" or "Latin 1" and is the nearest thing to a "standard" character set as far as operating systems like DOS or older versions of Windows were concerned.