Convert two ascii characters to their 'corresp

The problem: I have two fixed width strings from an external system. The first contains the base characters (like a-z), the second (MAY) contain diacritics to be appended to the first string to create the actual characters.

string asciibase = "Dutch has funny chars: a,e,u";
string diacrits  = "                       ' \" \"";

//no clue what to do

string result = "Dutch has funny chars: á,ë,ü";

I could write a massive search and replace for all characters + different diacritics but was hoping for something a bit more elegant.

Somebody have a clue how to fix this one? Tried it with calculating the decimal values, using string.Normalize (c#) but no results. Also Google didn't really turn up with something.

标签： c# localization ascii extended-ascii

4条回答

欢心

2楼-- · 2020-07-10 10:44

I don't know C#, or its standard libraries, but one alternative approach might be to utilize something like an existing HTML/SGML/XML character entity parser/renderer, or if you actually are going to present it to a browser, nothing!

Pseudo code:

for(i=0; i < strlen(either_string); i++) {
  if isspace(diacrits[i]) {
     output(asciibase[i]);
  }else{
     output("&");
     output(asciibase[i]);
     switch (diacrits[i]) {
       case '"' : output "uml"; break;
       case '^' : output "circ"; break;
       case '~' : output "tilde"; break;
       case 'o' : output "ring"; break;
       ... and so on for each "code" in the diacrits modifier
       ... (for acute, grave, cedil, lig, ...)
     }
     output(";");
  }
}

Thus, A + o -> Å, u + " -> ü and so on.

If you can then parse html entities, you should then be home free, and even portable between charsets!

0人赞添加讨论(0) 举报

Deceive 欺骗

3楼-- · 2020-07-10 10:53

Convert the diacritics to suitable unicode values from the Unicode combining diacritical marks range:

http://www.unicode.org/charts/PDF/U0300.pdf

Then slap the char and its diacritic together e.g. for e-acute, U+0065 = "e" and U+0301 = acute.

  String s = "\u0065\u0301";

Then:

  string normalisedString = s.Normalize();

Will combine the two into a new string.

0人赞添加讨论(0) 举报

倾城　Initia

4楼-- · 2020-07-10 11:00

The problem is, that the specified diacrits have to be explicitly parsed, cause the double points don't exists sole and so the double quotes are used for this case. So to accomplish your problem you don't have any other chance then to implement each needed case.

Here is a starting point to get a clue...

    public SomeFunction()
    {
        string asciiChars = "Dutch has funny chars: a,e,u";
        string diacrits = "                       ' \" \"";

        var combinedChars = asciiChars.Zip(diacrits, (ascii, diacrit) =>
        {
            return CombineChars(ascii, diacrit);
        });

        var Result = new String(combinedChars.ToArray());
    }

    private char CombineChars(char ascii, char diacrit)
    {
        switch (diacrit)
        {
            case '"':
                return AddDoublePoints(ascii);
            case '\'':
                return AddAccent(ascii);
            default:
                return ascii;
        }
    }

    private char AddDoublePoints(char ascii)
    {
        switch (ascii)
        {
            case 'a':
                return 'ä';
            case 'o':
                return 'ö';
            case 'u':
                return 'ü';
            default:
                return ascii;
        }
    }

    private char AddAccent(char ascii)
    {
        switch (ascii)
        {
            case 'a':
                return 'á';
            case 'o':
                return 'ó';
            default:
                return ascii;
        }
    }
}

The IEnumerable.Zip is already implemented in .Net 4, but to get it in 3.5 you'll need this code (taken from Eric Lippert):

public static class IEnumerableExtension
{
    public static IEnumerable<TResult> Zip<TFirst, TSecond, TResult>
        (this IEnumerable<TFirst> first,
        IEnumerable<TSecond> second,
        Func<TFirst, TSecond, TResult> resultSelector)
    {
        if (first == null) throw new ArgumentNullException("first");
        if (second == null) throw new ArgumentNullException("second");
        if (resultSelector == null) throw new ArgumentNullException("resultSelector");
        return ZipIterator(first, second, resultSelector);
    }

    private static IEnumerable<TResult> ZipIterator<TFirst, TSecond, TResult>
        (IEnumerable<TFirst> first,
        IEnumerable<TSecond> second,
        Func<TFirst, TSecond, TResult> resultSelector)
    {
        using (IEnumerator<TFirst> e1 = first.GetEnumerator())
        using (IEnumerator<TSecond> e2 = second.GetEnumerator())
            while (e1.MoveNext() && e2.MoveNext())
                yield return resultSelector(e1.Current, e2.Current);
    }
}

0人赞添加讨论(0) 举报

够拽才男人

5楼-- · 2020-07-10 11:03

I cannot find an easy solution except using lookup tables:

public void TestMethod1()
{
    string asciibase = "Dutch has funny chars: a,e,u";
    string diacrits = "                       ' \" \"";
    var merged = DiacritMerger.Merge(asciibase, diacrits);
}

[EDIT: Simplified code after suggestions in the answers from @JonB and @Oliver]

public class DiacritMerger
{
    static readonly Dictionary<char, char> _lookup = new Dictionary<char, char>
                         {
                             {'\'', '\u0301'},
                             {'"', '\u0308'}
                         };

    public static string Merge(string asciiBase, string diacrits)
    {
        var combined = asciiBase.Zip(diacrits, (ascii, diacrit) => DiacritVersion(diacrit, ascii));
        return new string(combined.ToArray());
    }

    private static char DiacritVersion(char diacrit, char character)
    {
        char combine;
        return _lookup.TryGetValue(diacrit, out combine) ? new string(new [] {character, combine}).Normalize()[0] : character;
    }
}

0人赞添加讨论(0) 举报

Convert two ascii characters to their 'corresp

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间