I am trying to write a program that can transliterate CJK to Latin (i.e Pinyin, Romaji, etc.). For example you give a Chinese, Japanese or Korean document as input and then you get the transliterated version into Latin as output.
I am new in this field so please bear with me here.
Obviously, first I need to detect the type of the language (Chinese, Japanese or Korean) before getting any further. Then, as I understood so far, in order to do the transliteration, I need to divide the text into words, since in these languages there is no space between words. This is called word segmentation. Finally after finding out the words I need to transliterate them into Latin.
So here is my question:
- There are lots of (well not really! Better say some) libraries that do the transliteration job, since I'm looking for open source ones in C/C++, I found Adson (only for Chinese) and ICU4C. Cloned Git repo from Adson didn't compile. And I was not able to find simple, straight forward tutorial for ICU4C. How can I find some tutorial on ICU4C usage? Do you know any other library to transliterate CJK to Latin? If the accuracy ratio is higher(~90%), I can forget about it being written in C++.