The problem: I have two fixed width strings from an external system. The first contains the base characters (like a-z), the second (MAY) contain diacritics to be appended to the first string to create the actual characters.
string asciibase = "Dutch has funny chars: a,e,u";
string diacrits = " ' \" \"";
//no clue what to do
string result = "Dutch has funny chars: á,ë,ü";
I could write a massive search and replace for all characters + different diacritics but was hoping for something a bit more elegant.
Somebody have a clue how to fix this one? Tried it with calculating the decimal values, using string.Normalize (c#) but no results. Also Google didn't really turn up with something.
I don't know C#, or its standard libraries, but one alternative approach might be to utilize something like an existing HTML/SGML/XML character entity parser/renderer, or if you actually are going to present it to a browser, nothing!
Pseudo code:
Thus,
A + o
->Å
,u + "
->ü
and so on.If you can then parse html entities, you should then be home free, and even portable between charsets!
Convert the diacritics to suitable unicode values from the Unicode combining diacritical marks range:
http://www.unicode.org/charts/PDF/U0300.pdf
Then slap the char and its diacritic together e.g. for e-acute, U+0065 = "e" and U+0301 = acute.
Then:
Will combine the two into a new string.
The problem is, that the specified diacrits have to be explicitly parsed, cause the double points don't exists sole and so the double quotes are used for this case. So to accomplish your problem you don't have any other chance then to implement each needed case.
Here is a starting point to get a clue...
The IEnumerable.Zip is already implemented in .Net 4, but to get it in 3.5 you'll need this code (taken from Eric Lippert):
I cannot find an easy solution except using lookup tables:
[EDIT: Simplified code after suggestions in the answers from @JonB and @Oliver]