Hpricot - UTF-8 issues

2019-08-14 13:01发布

I get the following error when running the code below:

invalid byte sequence in UTF-8 (ArgumentError)

The code:

require 'hpricot'
require 'open-uri'

doc = open('http://www.amazon.co.jp/') {|f| Hpricot(f.read) }
puts doc.to_html

Hpricot cannot parse the Japanese content. Any suggestions on fixing this issue?

1条回答
乱世女痞
2楼-- · 2019-08-14 13:11

The site doesn't seem to be using UTF-8: <meta http-equiv="content-type" content="text/html; charset=Shift_JIS" />.

Try this instead:

open('http://www.amazon.co.jp/') {|f| Hpricot(f.read.encode("UTF-8")) }
查看更多
登录 后发表回答