Normalization of Strings With String.ToUpperInvari

2019-02-09 03:29发布

问题:

I am currently storing normalized versions of strings in my SQL Server database in lower case. For example, in my Users table, I have a UserName and a LoweredUserName field. Depending on the context, I either use T-SQL's LOWER() function or C#'s String.ToLower() method to generate the lower case version of the user name to fill the LoweredUserName field. According to Microsoft's guidelines and Visual Studio's code analysis rule CA1308, I should be using C#'s String.ToUpperInvariant() instead of ToLower(). According to Microsoft, this is both a performance and globalization issue: converting to upper case is safe, while converting to lower case can cause a loss of information (for example, the Turkish 'I' problem).

If I move to using ToUpperInvariant for string normalization, I will have to change my database schema as well, since my schema is based on Microsoft's ASP.NET Membership framework (see this related question), which normalizes strings to lower case.

Isn't Microsoft contradicting itself by telling us to use upper case normalization in C#, while it's own code in the Membership tables and procedures is using lower case normalization? Should I switch everything to upper case normalization, or just continue using lower case normalization?

回答1:

To answer your first question, yes Microsoft is a bit inconsistent. To answer your second question, no do not switch anything until you have confirmed that this is causing a bottleneck in your application.

Think how much forward progress you can make on you project instead of wasting time switching everything. Your development time is much more valuable than the savings you would get from such a change.

Remember:

Premature optimization is the root of all evil (or at least most of it) in programming. - Donald Knuth



回答2:

According to CA1308, the reason to do this is that some characters cannot be roundtrip converted from upper to lower case. The important thing is that you always move in one direction, so if your standard is to always move to lower case then there is no reason to change it.



回答3:

Continue using lower case normalization. Only change to conform to Microsoft standards if a large issue develops.

This is unfortunate, but worthwhile. Sadly, Microsoft "standards" tend to be poorly considered and somewhat less than consistent; experience with them has shown that unless there is a compelling reason, it's best to simply stick with what works while it works. Note that this is generally NOT true of non-Microsoft technologies; but the arbitrariness of the Microsoft "standards" makes them worth avoiding.

Edit: I should clarify here; my opinion of Microsoft is very low, from long experience with their standards. As was pointed out in the comments, I don't have particular references to point out about "everybody else other than Microsoft"; this just comes from my personal experience. Your Mileage May Vary widely. This answer should be considered really just my opinion. Sorry for not making that more clear earlier.