Are there any scripts, libraries, or programs using Python
, or BASH
tools (e.g. awk
, perl
, sed
) which can correctly convert numbered pinyin (e.g. dian4 nao3) to UTF-8 pinyin with tone marks (e.g. diàn nǎo)?
I have found the following examples, but they require PHP
or #C
:
- PHP Convert numbered to accentuated Pinyin?
C Any libraries to convert number Pinyin to Pinyin with tone markings?
I have also found various On-line tools, but they cannot handle a large number of conversions.
The cjklib library does cover your needs:
Either use the Python shell:
Or just the command line:
Disclaimer: I developed that library.
Updated code: Careful that @Lakedaemon's Kotlin code doesn't contemplate the tone placement rules.
I originally ported @Lakedaemon's Kotlin code to Java, now I modified it and urge people who used this or @Lakedaemon's Kotlin code to update it.
I added an extra auxiliary function to get the correct tone mark postion.
I ported the code from dani_l to Kotlin (the code in java should be quite similar). It goes :
I wrote another Python function that does this, which is case insensitive and preserves spaces, punctuation and other text (unless there are false positives, of course):
I've got some Python 3 code that does this, and it's small enough to just put directly in the answer here.
This handles
ü
,u:
, andv
, all of which I've encountered. Minor modifications will be needed for Python 2 compatibility.I came across a VBA macro that does it in Microsoft Word, at pinyinjoe.com
Had a minor flaw which I reported and he responded that he would incorporate my suggestion "as soon as I can" That was early in January 2014; I haven’t had any motivation to check, since it is already done in my copy.