I am trying to write a function, that translates a string containing unicode characters into some default ASCII transcription. Ideally I'd like e.g. Ångström
to become Angstroem
or, if that is not possible, Angstrom
. Likewise α=χ
should become a=x
(c?) or similar.
Does Emacs have such built-in capabilities? I know I can get the names and similar of characters (get-char-code-property
) but I know no built-in transcription table.
The purpose is to translate titles of entries into meaningfully readable filenames, avoiding problems with software that doesn't understand unicode.
My current strategy is to build a translation-table by hand, but this approach is fairly limited and requires a lot of maintenance.
There is no built-in capability that i know of. I wrote a package
unidecode
specifically for your task. It uses the same approach as in Python's same-named library. To install just add MELPA repository to your repository list:Then run M-x package-install RET unidecode.
unidecode
has 2 functions,unidecode-unidecode
that turns Unicode into ASCII, andunidecode-sanitize
that discards non-alphanumeric characters and transforms space into hyphen.