问题:

My aim is to extract the html from all the links in the first page after entering the google search term. I work behind a proxy so this is my approach.

1.I first used mechanize to enter the search term in the form , ive set the proxies and robots correctly.

2.After extracting the links , Ive used an opener using urllib2.ProxyHandler globally , to open the urls individually.

However this gives me this error. Not able to figure it out.

urlopen error [Errno 8] _ssl.c:504: EOF occurred in violation of protocol

回答1:

Instead of copying and editing Python library modules, you can monkey-patch ssl.wrap_socket() in the ssl module by overriding the ssl_version keyword parameter. The following code can be used as-is. Put this at the start of your program before making any requests.

import ssl
from functools import wraps
def sslwrap(func):
    @wraps(func)
    def bar(*args, **kw):
        kw['ssl_version'] = ssl.PROTOCOL_TLSv1
        return func(*args, **kw)
    return bar

ssl.wrap_socket = sslwrap(ssl.wrap_socket)