from java.lang.StringCoding :
String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;
This is what is used from Java.lang.getBytes() , in linux jdk 7 I was always under the impression that UTF-8 is the default charset ?
Thanks
from java.lang.StringCoding :
String csn = (charsetName == null) ? "ISO-8859-1" : charsetName;
This is what is used from Java.lang.getBytes() , in linux jdk 7 I was always under the impression that UTF-8 is the default charset ?
Thanks
Elaborate on Skeet's answer (which is of course the correct one)
In java.lang.String's source
getBytes()
callsStringCoding.encode(char[] ca, int off, int len)
which has on its first line :Then (not immediately but absolutely) it calls
static byte[] StringEncoder.encode(String charsetName, char[] ca, int off, int len)
where the line you quoted comes from - passing as the charsetName the csn - so in this line thecharsetName
will be the default charset if one exists.The parameterless
String.getBytes()
method doesn't use ISO-8859-1 by default. It will use the default platform encoding, if that can be determined. If, however, that's either missing or is an unrecognized encoding, it falls back to ISO-8859-1 as a "default default".You should very rarely see this in practice. Normally the platform default encoding will be detected correctly.
However, I'd strongly suggest that you specify an explicit character encoding every time you perform an encode or decode operation. Even if you want the platform default, specify that explicitly.
It is a bit complicated ...
Java tries to use the default character encoding to return bytes using String.getBytes().
.... Here is the tricky part (which is probably never going to come into play) ....
If the system cannot decode or encode strings using the default charset (UTF-8 or another one), then there will be a fallback to ISO-8859-1. If the fallback does not work ... the system will fail!
.... Really ... (gasp!) ... Could it crash if my specified charset cannot be used, and UTF-8 or ISO-8859-1 are also unusable?
Yes. The Java source comments state in the StringCoding.encode(...) method:
... and then it calls System.exit(1)
So, why is there an intentional fallback to ISO-8859-1 in the getBytes() method?
It is possible, although not probable, that the users JVM may not support decoding and encoding in UTF-8 or the charset specified on JVM startup.
Then, is the default charset used properly in the String class during getBytes()?
No. However, the better question is ...
Does String.getBytes() deliver what it promises?
The contract as defined in the Javadoc is correct.
The good news (and better way of doing things)
It is always advised to explicitly specify "ISO-8859-1" or "US-ASCII" or "UTF-8" or whatever character set you want when converting bytes into Strings of vice-versa -- unless -- you have previously obtained the default charset and made 100% sure it is the one you need.
Use this method instead:
To find the default for your system, just use:
Hope that helps.
That's for compatibility reason.
Historically, all java methods on Windows and Unix not specifying a charset were using the common one at the time, that is
"ISO-8859-1"
.As mentioned by Isaac and the javadoc, the default platform encoding is used (see Charset.java) :
Always specify the charset when doing string to bytes or bytes to string conversion.
Even when, as is the case for
String.getBytes()
you still find a non deprecated method not taking the charset (most of them were deprecated when Java 1.1 appeared). Just like with endianness, the platform format is irrelevant, what is relevant is the norm of the storage format.