I can't remove whitespaces from a string parse

2019-03-28 01:58发布

问题:

I can't remove whitespaces from a string.

My HTML is:

<p class='your-price'>
Cena pro Vás: <strong>139&nbsp;<small>Kč</small></strong>
</p>

My code is:

#encoding: utf-8
require 'rubygems'
require 'mechanize'

agent = Mechanize.new
site  = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")

val = price.first.text  => "139 "
val.strip               => "139 "
val.gsub(" ", "")       => "139 "

gsub, strip, etc. don't work. Why, and how do I fix this?

val.class      => String
val.dump       => "\"139\\u{a0}\""      !
val.encoding   => #<Encoding:UTF-8>

__ENCODING__               => #<Encoding:UTF-8>
Encoding.default_external  => #<Encoding:UTF-8>

I'm using Ruby 1.9.3 so Unicode shouldn't be problem.

回答1:

strip only removes ASCII whitespace and the character you've got here is a Unicode no-break space.

Removing the character is easy. You can use gsub by providing a regex with the character code: gsub(/\u00a0/, '')

You could also call gsub(/[[:space:]]/, '') to remove all Unicode whitespace. For details, check the documentation