In my android application, i want to compare an utf-8 string, for example "bãi" with string which user type on edittext.
However, if I type "bãi" to edittext and get input string by using method edittext.getText().toString(), it will return string like
and it will not equal "bãi"
I also try
String input = new String(input.getBytes("UTF-8"), "UTF-8");
but it not work. input.equals("bãi") will return false.
Is anyone know how solve this problem. Thanks for any help.
In Unicode, certain characters can be represented in more than one way. For example, in the word bãi the middle character can be represented in two ways:
For display, both should look the same.
For string comparison, this poses a problem. The solution is to normalize the strings first according to Unicode Standard Annex #15 — Unicode Normalization Forms.
Normalization is supported in Java (incl. Android) by the Normalizer class (for Android see Normalizer).
The code below shows the result:
It outputs:
BTW: The form
Form.NFD
decomposes the strings, i.e. it creates the longer representation with two codepoints.Form.NFC
would create the shorter form.