Convert special characters to normal

2020-07-23 08:55发布

问题:

I need a way to convert special characters like this:

Helloæ

To normal characters. So this word would end up being Helloae. So far I have tried HttpUtility.Decode, or a method that would convert UTF8 to win1252, but nothing worked. Is there something simple and generic that would do this job?

Thank you.

EDIT

I have tried implementing those two methods using posts here on OC. Here's the methods:

public static string ConvertUTF8ToWin1252(string _source)
{
    Encoding utf8 = new UTF8Encoding();
    Encoding win1252 = Encoding.GetEncoding(1252);

    byte[] input = _source.ToUTF8ByteArray();
    byte[] output = Encoding.Convert(utf8, win1252, input);

    return win1252.GetString(output);
}

// It should be noted that this method is expecting UTF-8 input only,
// so you probably should give it a more fitting name.
private static byte[] ToUTF8ByteArray(this string _str)
{
    Encoding encoding = new UTF8Encoding();
    return encoding.GetBytes(_str);
}

But it did not worked. The string remains the same way.

回答1:

See: Does .NET transliteration library exists?

UnidecodeSharpFork

Usage:

var result = "Helloæ".Unidecode();
Console.WriteLine(result) // Prints Helloae


回答2:

There is no direct mapping between æ and ae they are completely different unicode code points. If you need to do this you'll most likely need to write a function that maps the offending code points to the strings that you desire.

Per the comments you may need to take a two stage approach to this:

  1. Remove the diacritics and combining characters per the link to the possible duplicate
  2. Map any characters left that are not combining to alternate strings
switch(badChar){
   case 'æ':
   return "ae";
   case 'ø':
   return "oe";
   // and so on
}