I have come across this line of legacy code, which I am trying to figure out:
String newString = new String(oldString.getBytes("UTF-8"), "UTF-8"));
As far as I can understand, it is encoding & decoding using the same charSet.
How is this different from the following?
String newString = oldString;
Is there any scenario in which the two lines will have different outputs?
p.s.: Just to clarify, yes I am aware of the excellent article on encoding by Joel Spolsky !
This could be complicated way of doing
This shortens the String is the underlying char[] used is much longer.
However more specifically it will be checking that every character can be UTF-8 encoded.
There are some "characters" you can have in a String which cannot be encoded and these would be turned into
?
Any character between \uD800 and \uDFFF cannot be encoded and will be turned into '?'
prints
This line of code here:
constructs a new String object (i.e. a copy of
oldString
), while this line of code:declares a new variable of type
java.lang.String
and initializes it to refer to the same String object as the variableoldString
.Absolutely:
vs.
a_horse_with_no_name (see comment) is right of course. The equivalent of
is
minus the subtle difference wrt the encoding that Peter Lawrey explains in his answer.