short: How to execute/simulate javascript redirection with python Mechanize?
location.href="http://www.site2.com/";
I've made a python script with mechanize module that looks for a link in a page and follows it.
The problem is on a particular site that when I do
br.follow_link("http://www.address1.com")
he redirects me to this simple page:
<script language="JavaScript">{
location.href="http://www.site2.com/";
self.focus();
}</script>
Now, if I do:
br = mechanize.Browser(factory=mechanize.RobustFactory())
... #other code
br.follow_link("http://www.address1.com")
for link in br.links():
br.follow_link(link)
print link
it doesn't prints anything, that means that there is no link in that page. But if I manually parse the page and I execute:
br.open("http://www.site2.com")
Site2 doesn't recognizes that I'm coming from "www.address1.com" and the script does not work as I would like!
Sorry if it's just a newbie question and thank you in advance!
p.s. I have br.set_handle_referer(True)
EDIT: more info: Inspecting that link with Fiddler2 it looks like:
GET http://www.site2.com/ HTTP/1.1 Host: www.site2.com Connection: keep-alive User-Agent: Mozilla/5.0 (Windows NT 6.2; WOW64) AppleWebKit/537.4 (KHTML, like Gecko) Chrome/22.0.1229.94 Safari/537.4 Accept: text/html,application/xhtml+xml,application/xml;q=0.9,/;q=0.8 Referer: http://www.address1.com Accept-Encoding: gzip,deflate,sdch Accept-Language: it-IT,it;q=0.8,en-US;q=0.6,en;q=0.4
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.3 Cookie: PHPSESSID=6e161axxxxxxxxxxx; user=myusername;
pass=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx; ip=79.xx.xx.xx;
agent=a220243a8b8f83de64c6204a5ef7b6eb; __utma=154746788.943755841.1348303404.1350232016.1350241320.43; __utmb=154746788.12.10.1350241320; __utmc=154999999; __utmz=154746788.134999998.99.6.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=%something%something%
so it seems to be a cookie problem?
I solved it! in this way:
Mechanize can't deal with JavaScript, since it can't interpret it, try parsing your site manually and passing this link to,
br.follow_link
.How about
If you use
br_follow_link
hopefully that sets the HTTP referrer with the previous page. Whereas if you dobr.open
that's like opening a new window, it doesn't set the HTTP referrer header.Edit. Ok it looks like
.follow_link
doesn't take strings but takes a specialmechanize.Link
object with a property.absolute_url
. You can fake that.or make a real
mechanize.Link
which is less hacky but more tedious.You could set the HTTP referrer header explicitly before making your request
More details in the surprisingly difficult to find official docs http://wwwsearch.sourceforge.net/mechanize/doc.html