I can't remove whitespaces from a string.
My HTML is:
<p class='your-price'>
Cena pro Vás: <strong>139 <small>Kč</small></strong>
</p>
My code is:
#encoding: utf-8
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
site = agent.get("http://www.astratex.cz/podlozky-pod-raminka/doplnky")
price = site.search("//p[@class='your-price']/strong/text()")
val = price.first.text => "139 "
val.strip => "139 "
val.gsub(" ", "") => "139 "
gsub
, strip
, etc. don't work. Why, and how do I fix this?
val.class => String
val.dump => "\"139\\u{a0}\"" !
val.encoding => #<Encoding:UTF-8>
__ENCODING__ => #<Encoding:UTF-8>
Encoding.default_external => #<Encoding:UTF-8>
I'm using Ruby 1.9.3 so Unicode shouldn't be problem.
strip
only removesASCII
whitespace and the character you've got here is a Unicode no-break space.Removing the character is easy. You can use
gsub
by providing a regex with the character code:gsub(/\u00a0/, '')
You could also call
gsub(/[[:space:]]/, '')
to remove all Unicode whitespace. For details, check the documentation