I am writing these lines of code:
String name1 = fname.getText().toString();
String name2 = sname.getText().toString();
aru = 0;
count1 = name1.length();
count2 = name2.length();
for (i = 0; i < count1; i++)
{
for (j = 0; j < count2; j++)
{
if (name1.charAt(i)==name2.charAt(j))
aru++;
}
if(aru!=0)
aru++;
}
I want to compare the Character
s of two String
s ignoring the case. Simply using IgnoreCase
doesn't work. Adding '65' ASCII
value doesn't work either. How do I do this?
You can change the case of String before using it, like this
Then continue with rest operation.
This is how the JDK does it (adapted from OpenJDK 8, String.java/regionMatches):
I suppose that works for Turkish also?
You have to consider the Turkish I problem when comparing characters/ lowercasing / uppercasing:
I suggest to convert to String and use toLowerCase with invariant culture (in most cases at least).
public final static Locale InvariantLocale = new Locale(Empty, Empty, Empty); str.toLowerCase(InvariantLocale)
See similar C# string.ToLower() and string.ToLowerInvariant()
Note: Don't use String.equalsIgnoreCase http://nikolajlindberg.blogspot.co.il/2008/03/beware-of-java-comparing-turkish.html
You can't actually do the job quite right with
toLowerCase
, either on a string or in a character. The problem is that there are variant glyphs in either upper or lower case, and depending on whether you uppercase or lowercase your glyphs may or may not be preserved. It's not even clear what you mean when you say that two variants of a lower-case glyph are compared ignoring case: are they or are they not the same? (Note that there are also mixed-case glyphs:\u01c5, \u01c8, \u01cb, \u01f2
or Dž, Lj, Nj, Dz, but any method suggested here will work on those as long as they should count as the same as their fully upper or full lower case variants.)There is an additional problem with using
Char
: there are some 80 code points not representable with a singleChar
that are upper/lower case variants (40 of each), at least as detected by Java's code point upper/lower casing. You therefore need to get the code points and change the case on these.But code points don't help with the variant glyphs.
Anyway, here's a complete list of the glyphs that are problematic due to variants, showing how they fare against 6 variant methods:
toLowerCase
toUpperCase
toLowerCase
toUpperCase
equalsIgnoreCase
toLowerCase(toUpperCase)
(or vice versa)For these methods,
S
means that the variants are treated the same as each other,D
means the variants are treated as different from each other.Complicating this still further is that there is no way to get the Turkish I's right (i.e. the dotted versions are different than the undotted versions) unless you know you're in Turkish; none of these methods give correct behavior and cannot unless you know the locale (i.e. non-Turkish:
i
andI
are the same ignoring case; Turkish, not).Overall, using
toUpperCase
gives you the closest approximation, since you have only five uppercase variants (or four, not counting Turkish).You can also try to specifically intercept those five troublesome cases and call
toUpperCase(toLowerCase(c))
on them alone. If you choose your guards carefully (justtoUpperCase
ifc < 0x130 || c > 0x212B
, then work through the other alternatives) you can get only a ~20% speed penalty for characters in the low range (as compared to ~4x if you convert single characters to strings andequalsIgnoreCase
them) and only about a 2x penalty if you have a lot in the danger zone. You still have the locale problem with dottedI
, but otherwise you're in decent shape. Of course if you can useequalsIgnoreCase
on a larger string, you're better off doing that.Here is sample Scala code that does the job:
You could put both chars in lower case and then compare them.
Generic methods to compare a char at a position between 2 strings with ignore case.
Function call