Ruby: Checking for East Asian Width (Unicode)

2019-04-28 16:08发布

问题:

Using Ruby, I have to output strings in an columnar format to the terminal. Something like this:

| row 1     | a string here     | etc
| row 2     | another string    | etc

I can do this fine with Latin UTF8 characters using String#ljust and %s.

But a problem arises when the characters are Korean, Chinese, etc. The columns simply won't align when there are rows of English interspersed with rows containing Korean, etc.

How can I get column alignment here? Is there a way to output Asian characters in the equivalent of a fixed-width font? How about for documents that are meant to be displayed and edited in Vim?

回答1:

Late to the party, but hopefully still helpful: In Ruby, you can use the unicode-display_width gem to check for a string's east-asian-width:

require 'unicode/display_width'
"⚀".display_width #=> 1
'一'.display_width #=> 2


回答2:

Your problem happens with CJK (Chinese/Japanese/Korean) full-width and wide characters (also scroll down for diagrams); those characters occupy two fixed-width cells. String#ljust and friends don't take this into account.

There is unicodedata.east_asian_width in Python, which would allow you to write your own width-aware ljust, but it doesn't seem to exist in Ruby. The best I've been able to find is this blog post: http://d.hatena.ne.jp/hush_puppy/20090227/1235740342 (machine translation). If you look at the output at the bottom of the original, it seems to do what you want, so maybe you can reuse some of the Ruby code.

Or if you're only printing full-width characters (i.e. you're not mixing half-width and full-width), you can be lazy and just use full-width forms of everything, including the spacing and the box drawing. Here's a couple characters you can copy and paste:

  • | (full-width vertical bar)
  •   (full-width space)
  • - (full-width dash; does not get rendered nicely in my terminal font)
  • ー (another full-width dash)