Upper vs Lower Case

2019-01-01 05:37发布

When doing case-insensitive comparisons, is it more efficient to convert the string to upper case or lower case? Does it even matter?

It is suggested in this SO post that C# is more efficient with ToUpper because "Microsoft optimized it that way." But I've also read this argument that converting ToLower vs. ToUpper depends on what your strings contain more of, and that typically strings contain more lower case characters which makes ToLower more efficient.

In particular, I would like to know:

  • Is there a way to optimize ToUpper or ToLower such that one is faster than the other?
  • Is it faster to do a case-insensitive comparison between upper or lower case strings, and why?
  • Are there any programming environments (eg. C, C#, Python, whatever) where one case is clearly better than the other, and why?

10条回答
无色无味的生活
2楼-- · 2019-01-01 05:39

From Microsoft on MSDN:

Best Practices for Using Strings in the .NET Framework

Recommendations for String Usage

Why? From Microsoft:

Normalize strings to uppercase

There is a small group of characters that when converted to lowercase cannot make a round trip.

What is example of such a character that cannot make a round trip?

  • Start: Greek Rho Symbol (U+03f1) ϱ
  • Uppercase: Capital Greek Rho (U+03a1) Ρ
  • Lowercase: Small Greek Rho (U+03c1) ρ

ϱ , Ρ , ρ

That is why, if your want to do case insensitive comparisons you convert the strings to uppercase, and not lowercase.

查看更多
伤终究还是伤i
3楼-- · 2019-01-01 05:40

Converting to either upper case or lower case in order to do case-insensitive comparisons is incorrect due to "interesting" features of some cultures, particularly Turkey. Instead, use a StringComparer with the appropriate options.

MSDN has some great guidelines on string handling. You might also want to check that your code passes the Turkey test.

EDIT: Note Neil's comment around ordinal case-insensitive comparisons. This whole realm is pretty murky :(

查看更多
梦寄多情
4楼-- · 2019-01-01 05:42

Microsoft has optimized ToUpperInvariant(), not ToUpper(). The difference is that invariant is more culture friendly. If you need to do case-insensitive comparisons on strings that may vary in culture, use Invariant, otherwise the performance of invariant conversion shouldn't matter.

I can't say whether ToUpper() or ToLower() is faster though. I've never tried it since I've never had a situation where performance mattered that much.

查看更多
高级女魔头
5楼-- · 2019-01-01 05:42

It really shouldn't ever matter. With ASCII characters, it definitely doesn't matter - it's just a few comparisons and a bit flip for either direction. Unicode might be a little more complicated, since there are some characters that change case in weird ways, but there really shouldn't be any difference unless your text is full of those special characters.

查看更多
零度萤火
6楼-- · 2019-01-01 05:42

It Depends. As stated above, plain only ASCII, its identical. In .NET, read about and use String.Compare its correct for the i18n stuff (languages cultures and unicode). If you know anything about likelyhood of the input, use the more common case.

Remember, if you are doing multiple string compares length is an excellent first discriminator.

查看更多
无色无味的生活
7楼-- · 2019-01-01 05:49

If you are doing string comparison in C# it is significantly faster to use .Equals() instead of converting both strings to upper or lower case. Another big plus for using .Equals() is that more memory isn't allocated for the 2 new upper/lower case strings.

查看更多
登录 后发表回答