Scraping: SSL: CERTIFICATE_VERIFY_FAILED error for

2020-02-02 04:12发布

问题:

I'm practicing the code from 'Web Scraping with Python', and I keep having this certificate problem:

from urllib.request import urlopen 
from bs4 import BeautifulSoup 
import re

pages = set()
def getLinks(pageUrl):
    global pages
    html = urlopen("http://en.wikipedia.org"+pageUrl)
    bsObj = BeautifulSoup(html)
    for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                #We have encountered a new page
                newPage = link.attrs['href'] 
                print(newPage) 
                pages.add(newPage) 
                getLinks(newPage)
getLinks("")

The error is:

  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 1319, in do_open
    raise URLError(err)
urllib.error.URLError: <urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1049)>

Btw,I was also practicing scrapy, but kept getting the problem: command not found: scrapy (I tried all sorts of solutions online but none works... really frustrating)

回答1:

Once upon a time I stumbled with this issue. If you're using macOS go to Macintosh HD > Applications > Python3.6 folder (or whatever version of python you're using) > double click on "Install Certificates.command" file. :D



回答2:

To solve this:

All you need to do is to install Python certificates! A common issue on macOS.

Open these files:

Install Certificates.command
Update Shell Profile.command

Simply Run these two scripts and you wont have this issue any more.

Hope this helps!



回答3:

This terminal command:

open /Applications/Python\ 3.7/Install\ Certificates.command

Found here: https://stackoverflow.com/a/57614113/6207266

Resolved it for me. With my config

pip install --upgrade certifi

had no impact.



回答4:

For novice users, you can go in the Applications folder and expand the Python 3.7 folder. Now first run (or double click) the Install Certificates.command and then Update Shell Profile.command



回答5:

Take a look at this post, it seems like for later versions of Python, certificates are not pre installed which seems to cause this error. You should be able to run the following command to install the certifi package: /Applications/Python\ 3.6/Install\ Certificates.command

Post 1: urllib and "SSL: CERTIFICATE_VERIFY_FAILED" Error

Post 2: Airbrake error: urlopen error [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate



回答6:

Two steps worked for me : - going Macintosh HD > Applications > Python3.7 folder - click on "Install Certificates.command"



回答7:

If you're running on a Mac you could just search for Install Certificates.command on the spotlight and hit enter.



回答8:

i didn't solve the problem, sadly. but managed to make to codes work (almost all of my codes have this probelm btw) the local issuer certificate problem happens under python3.7 so i changed back to python2.7 QAQ and all that needed to change including "from urllib2 import urlopen" instead of "from urllib.request import urlopen" so sad...



回答9:

For anyone who is using anaconda, you would install the certifi package, see more at:

https://anaconda.org/anaconda/certifi

To install, type this line in your terminal:

conda install -c anaconda certifi


回答10:

Use requests library. Try this solution, or just add https:// before the URL:

import requests
from bs4 import BeautifulSoup
import re

pages = set()
def getLinks(pageUrl):
    global pages
    html = requests.get("http://en.wikipedia.org"+pageUrl, verify=False).text
    bsObj = BeautifulSoup(html)
    for link in bsObj.findAll("a", href=re.compile("^(/wiki/)")):
        if 'href' in link.attrs:
            if link.attrs['href'] not in pages:
                #We have encountered a new page
                newPage = link.attrs['href']
                print(newPage)
                pages.add(newPage)
                getLinks(newPage)
getLinks("")

Check if this works for you



回答11:

Change your url from "http://en.wikipedia.org" to "https://en.wikipedia.org".