I have a std::string with UTF-8 characters in it.
I want to convert the string to its closest equivalent with ASCII characters.
For example:
Łódź => Lodz
Assunção => Assuncao
Schloß => Schloss
Unfortunatly ICU library is realy unintuitive and I haven't found good documentation on its usage, so it would take me too much time to learn to use it. Time I dont have.
Could someone give a little example about how can this be done??
thanks.
I don't know about ICU but ICONV does this and its quite easy to learn. it's only about 3-4 calls and what you need in your case is to use the ICONV_SET_TRANSLITERATE
flag using iconvctl()
.
Try this,
ucnv_convert("US-ASCII", "UTF-8", targer, targetsize, source, sourcesize, pError)
I wrote a callback that decomposes and then does some substitution. It could probably be implemented as a transliteration. code is here decompcb.c and header is nearby. Install it as follows on a Unicode-to-ASCII converter:
ucnv_setFromUCallBack(gConverter, &UCNV_FROM_U_CALLBACK_DECOMPOSE, &status);
then use gConverter to convert from unicode to ASCII
This isn't an area I'm an expert in, but if you don't have a library handy that does it for you easily then you might be better of just creating a lookup table/map which contains the UTF-8 -> ASCII values. ie. The key is the UTF-8 char, the value is the ASCII sequence of chars.
The ß->ss decomposition tells me you want the compatibility decomposition. In ICU, you need class Normalizer for that. Afterwards, you will end up with something like L'odz'.
From this string, you can simply remove the non-ASCII characters. No need for ICU, plain STL will do.