I have a string "\\u003c", which belongs to UTF-8 charset. I am unable to decode it to unicode because of the presence of double backslashes. How do i get "\u003c" from "\\u003c"? I am using java.
I tried with,
myString.replace("\\\\", "\\");
but could not achieve what i wanted.
This is my code,
String myString = FileUtils.readFileToString(file);
String a = myString.replace("\\\\", "\\");
byte[] utf8 = a.getBytes();
// Convert from UTF-8 to Unicode
a = new String(utf8, "UTF-8");
System.out.println("Converted string is:"+a);
and content of the file is
\u003c
Try using,
Another option, capture one of the two slashes and replace both slashes with the captured group:
You can use
String#replaceAll
:It looks weird because the first argument is a string defining a regular expression, and
\
is a special character both in string literals and in regular expressions. To actually put a\
in our search string, we need to escape it (\\
) in the literal. But to actually put a\
in the regular expression, we have to escape it at the regular expression level as well. So to literally get\\
in a string, we need write\\\\
in the string literal; and to get two literal\\
to the regular expression engine, we need to escape those as well, so we end up with\\\\\\\\
. That is:In the replacement parameter, even though it's not a regex, it still treats
\
and$
specially — and so we have to escape them in the replacement as well. So to get one backslash in the replacement, we need four in that string literal."\\u003c"
does not 'belong to UTF-8 charset' at all. It is five UTF-8 characters: '\
', '0', '0', '3', and 'c'. The real question here is why are the double backslashes there at all? Or, are they really there? and is your problem perhaps something completely different? If the String"\\u003c"
is in your source code, there are no double backslashes in it at all at runtime, and whatever your problem may be, it doesn't concern decoding in the presence of double backslashes.Regarding the problem of "replacing double backslashes with single backslashes" or, more generally, "replacing a simple string, containing
\
, with a different simple string, containing\
" (which is not entirely the OP problem, but part of it):Most of the answers in this thread mention
replaceAll
, which is a wrong tool for the job here. The easier tool isreplace
, but confusingly, the OP states thatreplace("\\\\", "\\")
doesn't work for him, that's perhaps why all answers focus onreplaceAll
.Important note for people with JavaScript background: Note that
replace(CharSequence, CharSequence)
in Java does replace ALL occurrences of a substring - unlike in JavaScript, where it only replaces the first one!On the other hand,
replaceAll(String regex, String replacement)
-- more docs also here -- is treating both parameters as more than regular strings:(this is because
\
and$
can be used as backreferences to the captured regex groups, hence if you want to used them literally, you need to escape them).In other words, both first and 2nd params of
replace
andreplaceAll
behave differently. Forreplace
you need to double the\
in both params (standard escaping of a backslash in a string literal), whereas inreplaceAll
, you need to quadruple it! (standard string escape + function-specific escape)To sum up, for simple replacements, one should stick to
replace("\\\\", "\\")
(it needs only one escaping, not two).https://ideone.com/ANeMpw
https://www.ideone.com/Fj4RCO
Not sure if you're still looking for a solution to your problem (since you have an accepted answer) but I will still add my answer as a possible solution to the stated problem:
OUTPUT:
Here is online demo of the above code