Alphabetize Arabic and Japanese text that is in Un

2020-02-29 03:35发布

问题:

Does anyone have any code for alphabetizing Arabic and Japanese text that is in Unicode? If the code was in ruby that would be great.

回答1:

Unicode code points are not listed in alphabetic order (Z < a, for example), but they try to be approximately in that order anyway. There is a canonical unicode order, defined by the Unicode Collation Algorithm and they are also language-specific ordering (french order is not exacly the same as german or czech order, even with the same alphabet), which can be specified in locale information. I think the ICU library contains the language specific algorithms you are looking for.



回答2:

I don't know Ruby, but python has a function, ord() that translates a unicode special character to its unicode code point. For example,

>>> a = u'ل'
>>> ord(a)
0: 1604
>>> b = u'ع'
>>> ord(b)
1: 1593

Look for something like that in Ruby. I assume that the Arabic symbols are listed in unicode in alphabetic order.



回答3:

To ask the obvious question, what don't you like about mylist.sort?



回答4:

Depending on your needs words.sort in ruby will be fine for Japanese. The order the characters appear in Unicode are in a reasonably good sorting order. Can't vouch for Arabic though, but my guess is that it's ok as well.



回答5:

mylist.sort should work out of the box in Ruby 1.9 (which has built-in unicode support). In Ruby 1.8, where Unicode support isn't built in, I think you'd have to use the character-encodings gem extend the String class with UTF-8 string comparisions. (And then mylist.sort would work.)