My aim is to extract the html from all the links in the first page after entering the google search term. I work behind a proxy so this is my approach.
1.I first used mechanize to enter the search term in the form , ive set the proxies and robots correctly.
2.After extracting the links , Ive used an opener using urllib2.ProxyHandler globally , to open the urls individually.
However this gives me this error. Not able to figure it out.
urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol
Instead of copying and editing Python library modules, you can monkey-patch ssl.wrap_socket() in the ssl module by overriding the ssl_version keyword parameter. The following code can be used as-is. Put this at the start of your program before making any requests.
import ssl
from functools import wraps
def sslwrap(func):
@wraps(func)
def bar(*args, **kw):
kw['ssl_version'] = ssl.PROTOCOL_TLSv1
return func(*args, **kw)
return bar
ssl.wrap_socket = sslwrap(ssl.wrap_socket)
Its a known bug, how ever some solutions for it are mentioned in the comments of this link. See them , May be helpful to you, bug url.