Replacing characters in C# (ascii)

2020-01-30 11:41发布

I got a file with characters like these: à, è, ì, ò, ù - À. What i need to do is replace those characters with normal characters eg: à = a, è = e and so on..... This is my code so far:

StreamWriter sw = new StreamWriter(@"C:/JoinerOutput.csv");
string path = @"C:/Joiner.csv";
string line = File.ReadAllText(path);

if (line.Contains("à"))
{
    string asAscii = Encoding.ASCII.GetString(Encoding.Convert(Encoding.UTF8, Encoding.GetEncoding(Encoding.ASCII.EncodingName, new EncoderReplacementFallback("a"), new DecoderExceptionFallback()), Encoding.UTF8.GetBytes(line)));
    Console.WriteLine(asAscii);
    Console.ReadLine();

    sw.WriteLine(asAscii);
    sw.Flush();
}

Basically this searches the file for a specific character and replaces it with another. The problem that i am having is that my if statement doesn't work. How do i go about solving this?

This is a sample of the input file:

Dimàkàtso Mokgàlo
Màmà Ràtlàdi
Koos Nèl
Pàsèkà Modisè
Jèrèmiàh Morèmi
Khèthiwè Buthèlèzi
Tiànà Pillày
Viviàn Màswàngànyè
Thirèshàn Rèddy
Wàdè Cornèlius
ènos Nètshimbupfè

This is the output if use : line = line.Replace('à', 'a'); :

Ch�rl�n� Kirst�n
M�m� R�tl�di
Koos N�l
P�s�k� Modis�
J�r�mi�h Mor�mi
Kh�thiw� Buth�l�zi
Ti�n� Pill�y
Vivi�n M�sw�ng�ny�
Thir�sh�n R�ddy
W�d� Corn�lius
�nos N�tshimbupf�

With my code the symbol will be removed completely

标签: c# ascii
7条回答
爱情/是我丢掉的垃圾
2楼-- · 2020-01-30 12:35

I often use an extenstion method based on the version Dana supplied. A quick explanation:

  • Normalizing to form D splits charactes like è to an e and a nonspacing `
  • From this, the nospacing characters are removed
  • The result is normalized back to form D (I'm not sure if this is neccesary)

Code:

using System.Linq;
using System.Text;
using System.Globalization;

// namespace here
public static class Utility
{
    public static string RemoveDiacritics(this string str)
    {
        if (str == null) return null;
        var chars =
            from c in str.Normalize(NormalizationForm.FormD).ToCharArray()
            let uc = CharUnicodeInfo.GetUnicodeCategory(c)
            where uc != UnicodeCategory.NonSpacingMark
            select c;

        var cleanStr = new string(chars.ToArray()).Normalize(NormalizationForm.FormC);

        return cleanStr;
    }
}
查看更多
登录 后发表回答