How to set the Referer header before loading a pag

2019-05-10 06:21发布

问题:

Is there a straightforward way to set custom headers with Mechanize 2.3?

I tried a former solution but get:

$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p|
  p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main'
} 

# ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError)

回答1:

The docs say:

get(uri, parameters = [], referer = nil, headers = {}) { |page| ... }

so for example:

agent.get 'http://www.google.com/', [], agent.page.uri, {'foo' => 'bar'}

alternatively you might like:

agent.request_headers = {'foo' => 'bar'}
agent.get url


回答2:

You misunderstood the code you were copying. There was a newline in the example, but it disappeared in the formatting as it wasn't tagged as code. $agent contains nil since you're trying to use it before it has been initialized. You must initialize the object and then use it. Just try this:

$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' }


回答3:

For this question I noticed people seem to use:

page = agent.get("http://www.you.com/index_login/", :referer => "http://www.you.com/")

As an aside, now that I tested this answer, it seems this was not the issue behind my actual problem: that every visit to a site I'm scraping requires going through the login sequence pages again, even seconds later after the first logged-in visit, despite that I'm always loading and saving the complete cookie jar in yaml format. But that would lead to another question of course.