from lxml import html
import requests
url = "https://website.com/"
page = requests.get(url)
tree = html.fromstring(page.content)
page.content
-> SSLError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:748)
I run this script but I got this error. How can I do it?
Since your URL is "an internal corporate URL" (as stated in comments), I'm guessing it uses a self-signed certificate, or is issued by a self-signed CA certificate.
If that is in fact the case, you have two options:
(1) provide the path to your corporate CA (including the complete chain of intermediate certificates, if any) to requests.get()
call via verify
argument:
requests.get('https://website.lo', verify='/path/to/certfile')
or (2), disable client-side certificate verification altogether (but beware of all the security risks this entails, like a simple man-in-the-middle attacks, etc):
requests.get('https://website.lo', verify=False)
Fore completeness, the relevant verify
parameter is described in requests.request()
docs:
verify -- (optional) Either a boolean, in which case it controls whether we verify
the server's TLS certificate, or a string, in which case it must be a path
to a CA bundle to use. Defaults to True.
Use below code without SSL
from lxml import html
import requests
url = "http://website.com/"
page = requests.get(url)
tree = html.fromstring(page.content)
page.content