This question is similar to How to emulate MySQLs utf8_general_ci collation in PHP string comparisons but I want the function for vb.net rather than PhP.
Recently I make a lot of supposedly unique key.
Some of the keys are equivalent under UTF8 unicode collation.
For example, look at these 2 key:
byers-street-bistro__38.15_-79.07 byers-street-bistro__38.15_-79.07
If I paste that into front page, and look at the source code you'll see
byers-street-bistro__38.15_-79.07
byers-street-bistro__38.15_-79.07
Note: In stack overflow they still look different.
I know it's not the same. I guess even in stack exchange it doesn't show. Say I have 1 million such records and I want to test whether 2 string WILL be declared the same by MySQL UTF8 collation. I want to know that before uploading. How do I do that.
So vb.net think that those are different keys. When we created mysql query and upload that to database, the database complain it's the same key. Just one complain and the upload of 1 million databases will be stuck.
We don't even know what the hell is ? Where can we look that up anyway?
Anyway, I want a function that when given 2 strings will tell me whether they will count as the same or not.
If possible we want a function that convert strings into their most "standard" form.
For example, seems to encode nothing and the function would recoqnize all those nothing character and eliminate that.
Is there such thing?
So far this is what I do. I need something more comprehensive.
Private Function StraightenQuotesReplacement() As Generic.Dictionary(Of String, String)
Static replacement As Generic.Dictionary(Of String, String)
If replacement Is Nothing Then
replacement = New Generic.Dictionary(Of String, String)
replacement.Add(ChrW(&H201C), """")
replacement.Add(ChrW(&H201D), """")
replacement.Add(ChrW(&H2018), "'")
replacement.Add(ChrW(&H2019), "'")
End If
Return replacement
End Function
<Extension()>
Public Function straightenQuotes(ByVal somestring As String) As String
For Each key In StraightenQuotesReplacement.Keys
somestring = somestring.Replace(key, StraightenQuotesReplacement.Item(key))
Next
Return somestring
End Function
<Extension()>
Public Function germanCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("ä", "ae")
t = t.Replace("ö", "oe")
t = t.Replace("ü", "ue")
t = t.Replace("Ä", "Ae")
t = t.Replace("Ö", "Oe")
t = t.Replace("Ü", "Ue")
t = t.Replace("ß", "ss")
Return t
End Function
<Extension()>
Public Function japaneseCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("ヶ", "ケ")
Return t
End Function
<Extension()>
Public Function greekCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("ς", "σ")
t = t.Replace("ι", "ί")
Return t
End Function
<Extension()>
Public Function franceCharacter(ByVal s As String) As String
Dim t = s
t = t.Replace("œ", "oe")
Return t
End Function
<Extension()>
Public Function RemoveDiacritics(ByVal s As String) As String
Dim normalizedString As String
Dim stringBuilder As New StringBuilder
normalizedString = s.Normalize(NormalizationForm.FormD)
Dim i As Integer
Dim c As Char
For i = 0 To normalizedString.Length - 1
c = normalizedString(i)
If CharUnicodeInfo.GetUnicodeCategory(c) <> UnicodeCategory.NonSpacingMark Then
stringBuilder.Append(c)
End If
Next
Return stringBuilder.ToString()
End Function
<Extension()>
Public Function badcharacters(ByVal s As String) As String
Dim t = s
t = t.Replace(ChrW(8206), "")
Return t
End Function
<Extension()>
Public Function sanitizeUTF8_Unicode(ByVal str As String) As String
Return str.ToLower.removeDoubleSpaces.SpacetoDash.EncodeUrlLimited.straightenQuotes.RemoveDiacritics.greekCharacter.germanCharacter
End Function