ruby incorrect method behavior (possible depending

2019-05-21 07:22发布

问题:

I got weird behavior from ruby (in irb):

irb(main):002:0> pp "    LS 600"
"\302\240\302\240\302\240\302\240LS 600"

irb(main):003:0> pp "    LS 600".strip
"\302\240\302\240\302\240\302\240LS 600"

That means (for those, who don't understand) that strip method does not affect this string at all, same with gsub('/\s+/', '')

How can I strip that string (I got it while parsing Internet page)?

回答1:

The string "\302\240" is a UTF-8 encoded string (C2 A0) for Unicode code point A0, which represents a non breaking space character. There are many other Unicode space characters. Unfortunately the String#strip method removes none of these.

If you use Ruby 1.9.2, then you can solve this in the following way:

# Ruby 1.9.2 only.
# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".gsub(/^\p{Space}+|\p{Space}+$/, "")

In Ruby 1.8.7 support for Unicode is not as good. You might be successful if you can depend on Rails's ActiveSupport::Multibyte. This has the advantage of getting a working strip method for free. Install ActiveSupport with gem install activesupport and then try this:

# Ruby 1.8.7/1.9.2.
$KCODE = "u"
require "rubygems"
require "active_support/core_ext/string/multibyte"

# Remove any whitespace-like characters from beginning/end.
"\302\240\302\240LS 600".mb_chars.strip.to_s