-->

In Ruby, how to convert special characters like ë,

2020-07-13 07:29发布

问题:

I want to convert characters like ë to just plain e. I am looking to convert with regards to language and how people type cities. For example, most people actually type Brasilia when searching for it, instead of Brasília. And when news agencies like Rueters report on Brasília, they usually spell it Brasilia. So again, just looking for any gem (or character encoding math/method is probably better since that answer can be used, for reference, in other languages).

This is just to handle the typical "extended ASCII" character sets. Note: I am working with standard Unicode strings.

回答1:

Starting with Ruby 2.2, there is String#unicode_normalize to normalize unicode strings. The NFKD form separates character and punctuation:

'ë'.unicode_normalize(:nfkd).chars
#=> ["e", "̈"]
#     ^    ^
#   char  punctuation

Since the character is a valid ASCII codepoint and the punctuation is not, this can be used to remove the latter:

'ë,à,é,ä'.unicode_normalize(:nfkd).encode('ASCII', replace: '')
#=> "e,a,e,a"


回答2:

You may be looking for I18n#transliterate.

Gem is here, install with gem install i18n.

Example:

irb(main):001:0> require 'i18n'
=> true
irb(main):002:0> I18n.enforce_available_locales = false
=> false
irb(main):003:0> I18n.transliterate("ë,à,é,ä")
=> "e,a,e,a"