Does anyone have any code for alphabetizing Arabic and Japanese text that is in Unicode? If the code was in ruby that would be great.
问题:
回答1:
Unicode code points are not listed in alphabetic order (Z < a, for example), but they try to be approximately in that order anyway. There is a canonical unicode order, defined by the Unicode Collation Algorithm and they are also language-specific ordering (french order is not exacly the same as german or czech order, even with the same alphabet), which can be specified in locale information. I think the ICU library contains the language specific algorithms you are looking for.
回答2:
I don't know Ruby, but python has a function, ord() that translates a unicode special character to its unicode code point. For example,
>>> a = u'ل'
>>> ord(a)
0: 1604
>>> b = u'ع'
>>> ord(b)
1: 1593
Look for something like that in Ruby. I assume that the Arabic symbols are listed in unicode in alphabetic order.
回答3:
To ask the obvious question, what don't you like about mylist.sort
?
回答4:
Depending on your needs words.sort
in ruby will be fine for Japanese. The order the characters appear in Unicode are in a reasonably good sorting order. Can't vouch for Arabic though, but my guess is that it's ok as well.
回答5:
mylist.sort
should work out of the box in Ruby 1.9 (which has built-in unicode support). In Ruby 1.8, where Unicode support isn't built in, I think you'd have to use the character-encodings
gem extend the String class with UTF-8 string comparisions. (And then mylist.sort
would work.)