How to set the mechanize page encoding?

2019-02-22 17:00发布


I'm trying to get a page with an ISO-8859-1 encoding clicking on a link, so the code is similar to this:

page_result = page.link_with( :text => 'link_text' ).click

So far I get the result with a wrong encoding, so I see characters like:

'T�tulo:' instead of 'Título:'

I've tried several approaches, including:

  • Stating the encoding in the first request using the agent like:

    @page_search = @agent.get(
      :url => '',
      :headers => { 'Accept-Charset' => 'ISO-8859-1' } )
  • Stating the encoding for the page itself

      page_result.encoding = 'ISO-8859-1'

But I must be doing something wrong: a simple puts always show the wrong characters.

Do you know how to state the encoding?

Thanks in advance,

Added: Executable example:

require 'rubygems'
require 'mechanize'

WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"

@agent =

@page = @agent.get(
  :url => '',
  :headers => { 'Accept-Charset' => 'utf-8' } )

puts @page.body


Hey you can just do a: = 'utf-8'

Hope it helps!


The previous answer is correct, but in my code it looks slightly different:

agent =

page = agent.get('')

page.encoding = 'windows-1251''p').each do |para|
  puts para.text


Sorry, it was my mistake: I come from a Java background and there strings are internally converted to utf-16. I forgot Ruby doesn't do it. Mechanize was recovering the page flawlessly, but I needed to convert the data via iconv.

Mental note: Ruby stores the strings without converting its encoding.


Yeah, Mechanize will try to detect the encoding itself (using the NKF core Ruby library) to guess the encoding) and sometimes fails.

Maybe this might help:
WWW::Mechanize::Util::CODE_DIC[:SJIS] = "ISO-8859-1"

I'm not too sure about the exact syntax, but I think the CODE_DICT Hash might be a good place to look :)
I had a similar problem a while back.