Unable to download nltk data

2019-03-29 05:00发布

问题:

import nltk
nltk.download()

It shows [SSL:CERTIFICATE_VERIFY_FAILED]. In case of requests one can use verify=False, but what to do here.

UPDATE:

This error persists on Python 3.6, with NLTK 3.0, on Mac OS X 10.7.5:

Changing the index in the NLTK downloader (suggested here) allows the downloader to show all of NLTK's files, but when one tries to download all, one gets another SSL error (see bottom of photo):

回答1:

I had the same problem when trying to configure both nltk and SpaCy. Per the instructions in this question, I was able to overcome the issue. Try running /Applications/Python\ 3.6/Install\ Certificates.command, then retry your NLTK download



回答2:

On MacOS 10.12.6 this was solved by entering the following in the bash terminal

pip install certifi
/Applications/Python\ 3.6/Install\ Certificates.command

the usual method of installing nltk corpora then worked for me

import nltk
nltk.download()


回答3:

If you want to download manually, for example you need tokenizer/punkt data, you can download directly to :

https://raw.githubusercontent.com/nltk/nltk_data/gh-pages/packages/tokenizers/punkt.zip

and place the punkt extracted folder in C:\nltk_data\tokenizers.



回答4:

(Adding "certificate verify failed _ssl.c:749" for SEO of this issue.)

Solved for me on Mac, 10.12.2 by using Paul Barry's tip of downloading via Python 2.7 (I can't comment because rep < 50)

Additional problems encountered and fixed: To be able to download NLTK via python 2.7 (the default Mac Python 2.7 setup) I also had to add the Python folder to the /.bash_profile as this comment shows.

Then, since I had set this path variable for 2.7, I had to remove it once the corpora were downloaded to be able to start python3. So remove it from /.bash_profile before starting python3.

After all that, I can run "import nltk" and "from nltk.book import *" without issues.



回答5:

OK, it's a bit of a hack, but here's what I had to do to be able to use the various NLTK data files in Python 3.x on my Mac laptop (running macOS 10.12.2).

Firstly, note that the certificate error only occurs when I try to download NLTK data using Python 3.x on my Mac (my Ubuntu VM inside of VirtualBox had no such error when using Python 3.x - which is annoying). Just why this causes an error on my Mac is beyond me, especially as the NLTK module installs into Python 3.x using pip with no issues. It's the connection to NLTK's download server which appears to cause the SSL verification issue.

My 'ah ha!' moment came when I realised that NLTK - when installed into Python 3.x or Python 2.x - shares the same directory structure among all the versions of Python installed on any computer. So, I used the Python 2.x which comes pre-installed on macOS to install NLTK, then used nltk.download() within Python 2.x to install the stopwords corpus with no issues. Having done this (in Python 2.x), I then went back into Python 3.x, and this code worked:

import nltk
from nltk.corpus import stopwords
print(stopwords.words('english'))

As I said, it's a bit of a hack, but this technique lets me get the NLTK data installed using Python 2.x, which I can them process with Python 3.x as required.