hello there i was wondering if it was possible to connect to a http host (I.e. for example google.com) and download the source of the webpage?
Thanks in advance.
hello there i was wondering if it was possible to connect to a http host (I.e. for example google.com) and download the source of the webpage?
Thanks in advance.
Google will block this request as it will try to block all robots. Add user-agent to the request.
You can use urllib2 module.
See the doc for more examples
The documentation of httplib (low-level) and urllib (high-level) should get you started. Choose the one that's more suitable for you.
so here's another approach to this problem using mechanize. I found this to bypass a website's robot checking system. i commented out the set_all_readonly because for some reason it wasn't recognized as a module in mechanize.