.net string comparison with collation

2019-08-01 15:10发布

问题:

I have 2 different strings (XXÈ and XXE). Is there any way to compare them using a collation (for this case, it would be UTF8 general CI - I need them to be equal)? I've seen few examples involving MSSQL or SQLLite - but this would add an unnecessary dependency to my project. So, my question is - is there any way to do this in pure .net (especially c#)?

Update:

Let's take any decent SQL engine as an example. You can create a table and you can select the collation for the table. In our case, XXÈ and XXE will be stored in the table, they will have different binary representations (depending on the encoding), but when you search for XXE, it will match also XXÈ.

My case is pretty much similar. I have a text file with some strings in it (UTF8). I want to display the values on screen (sorted - where the collation is again, relatively important) and I want to let the user search for values. The collation used for search will be an option.

回答1:

You could use String.Normalize and a little bit LINQ-power:

string initial = "XXÈ";
string normal = initial.Normalize(NormalizationForm.FormD);

var withoutDiacritics = normal.Where(
    c => CharUnicodeInfo.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
string final = new string(withoutDiacritics.ToArray());
bool equals = "XXE".Equals(final); // true

Reference: http://www.blackwasp.co.uk/RemoveDiacritics.aspx