I've found places on the web such as http://www.chinesetopinyin.com/ that convert Chinese characters to pinyin (romanization). Does anyone know how to do this, or have a database that can be parsed?
EDIT: I'm using C# but would actually prefer a database/flatfile.
Okay, first I used my question here to get the unicode:
Converting chinese character to Unicode
Then took a file like this to convert it: http://www.ic.unicamp.br/~stolfi/voynich/Notes/061/uc-to-py.tbl
possible solution using Python:
I think that Unicode database contains pinyin romanizations for chinese characters, but these are not included in
unicodedata
module data.however, you can use some external libraries, like cjklib, example:
output:
UPDATE
cjklib comes with an standalone
cjknife
utility, which micht help. some usage is described hereIf you use java, you can use pinyin4j.
http://pinyin4j.sourceforge.net/