Emacs lisp: Translate characters to standard ASCII

I am trying to write a function, that translates a string containing unicode characters into some default ASCII transcription. Ideally I'd like e.g. Ångström to become Angstroem or, if that is not possible, Angstrom. Likewise α=χ should become a=x (c?) or similar.

Does Emacs have such built-in capabilities? I know I can get the names and similar of characters (get-char-code-property) but I know no built-in transcription table.

The purpose is to translate titles of entries into meaningfully readable filenames, avoiding problems with software that doesn't understand unicode.

My current strategy is to build a translation-table by hand, but this approach is fairly limited and requires a lot of maintenance.

标签： emacs unicode character elisp translation

1条回答

爷的心禁止访问

2楼-- · 2019-06-26 09:50

There is no built-in capability that i know of. I wrote a package unidecode specifically for your task. It uses the same approach as in Python's same-named library. To install just add MELPA repository to your repository list:

(add-to-list 'package-archives
  '("melpa" . "http://melpa.milkbox.net/packages/") t)

Then run M-x package-install RET unidecode. unidecode has 2 functions, unidecode-unidecode that turns Unicode into ASCII, and unidecode-sanitize that discards non-alphanumeric characters and transforms space into hyphen.

ELISP> (unidecode-unidecode "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"!Hola!, Gruss Gott, Hyvaa paivaa, Tere ohtust, Bongu Czesc!, Dobry den, Zdravstvuite!, Geia sas, lmsllmlllmckhmslmgll"
ELISP> (unidecode-sanitize "¡Hola!, Grüß Gott, Hyvää päivää, Tere õhtust, Bonġu Cześć!, Dobrý den, Здравствуйте!, Γειά σας, გამარჯობა")
"hola-gruss-gott-hyvaa-paivaa-tere-ohtust-bongu-czesc-dobry-den-zdravstvuite-geia-sas-lmsllmlllmckhmslmgll"

0人赞添加讨论(0) 举报

Emacs lisp: Translate characters to standard ASCII

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间