When doing case-insensitive comparisons, is it more efficient to convert the string to upper case or lower case? Does it even matter?
It is suggested in this SO post that C# is more efficient with ToUpper because "Microsoft optimized it that way." But I've also read this argument that converting ToLower vs. ToUpper depends on what your strings contain more of, and that typically strings contain more lower case characters which makes ToLower more efficient.
In particular, I would like to know:
- Is there a way to optimize ToUpper or ToLower such that one is faster than the other?
- Is it faster to do a case-insensitive comparison between upper or lower case strings, and why?
- Are there any programming environments (eg. C, C#, Python, whatever) where one case is clearly better than the other, and why?
From Microsoft on MSDN:
Why? From Microsoft:
What is example of such a character that cannot make a round trip?
That is why, if your want to do case insensitive comparisons you convert the strings to uppercase, and not lowercase.
Converting to either upper case or lower case in order to do case-insensitive comparisons is incorrect due to "interesting" features of some cultures, particularly Turkey. Instead, use a StringComparer with the appropriate options.
MSDN has some great guidelines on string handling. You might also want to check that your code passes the Turkey test.
EDIT: Note Neil's comment around ordinal case-insensitive comparisons. This whole realm is pretty murky :(
Microsoft has optimized
ToUpperInvariant()
, notToUpper()
. The difference is that invariant is more culture friendly. If you need to do case-insensitive comparisons on strings that may vary in culture, use Invariant, otherwise the performance of invariant conversion shouldn't matter.I can't say whether ToUpper() or ToLower() is faster though. I've never tried it since I've never had a situation where performance mattered that much.
It really shouldn't ever matter. With ASCII characters, it definitely doesn't matter - it's just a few comparisons and a bit flip for either direction. Unicode might be a little more complicated, since there are some characters that change case in weird ways, but there really shouldn't be any difference unless your text is full of those special characters.
It Depends. As stated above, plain only ASCII, its identical. In .NET, read about and use String.Compare its correct for the i18n stuff (languages cultures and unicode). If you know anything about likelyhood of the input, use the more common case.
Remember, if you are doing multiple string compares length is an excellent first discriminator.
If you are doing string comparison in C# it is significantly faster to use .Equals() instead of converting both strings to upper or lower case. Another big plus for using .Equals() is that more memory isn't allocated for the 2 new upper/lower case strings.