Scikit Learn and Count Vectorizer Error

2019-09-05 09:14发布

问题:

Does anyone know where this ImportError is originating from and how to fix it? I'm working from a CSV file to do some text mining. At this point, I'm simply trying to tokenize the words in some job descriptions in the file and then vectorize and count the dimensions. However, I am getting this error. The original code follows this error message for you to see. I've tried uninstalling Anaconda and reinstalling it as well as all of the packages. This code runs absolutely fine on my PC (an old Gateway) but does not run on my Mac (2012) with Lion OSX.

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
<ipython-input-49-7fcd55a48eba> in <module>()
----> 1 from sklearn.feature_extraction.text import CountVectorizer
      2 cv = CountVectorizer(lowercase=True)
      3 vector = cv.fit_transform(words).toarray()
      4 print vector.shape

//anaconda/lib/python2.7/site-packages/sklearn/__init__.py in <module>()
     35     # process, as it may not be compiled yet
     36 else:
---> 37     from . import __check_build
     38     from .base import clone
     39     __check_build  # avoid flakes unused variable error

ImportError: cannot import name __check_build


from nltk.tokenize import word_tokenize

create a list of words for all postings

words = []
for p in postList[:100]:
    temp = word_tokenize(p[2])
    temp2 = [w.lower() for w in temp]
    string = ''
    for w in temp2:
        string += w + ', '
    string = string[:-1]
    words.append(string)
print words

from sklearn.feature_extraction.text import CountVectorizer
cv = CountVectorizer(lowercase=True)
vector = cv.fit_transform(words).toarray()
print vector.shape

回答1:

The scikit-learn you are importing has not been build correctly. Check you search path:

python -c "import sys; print(sys.path)"

and sklearn.__path__ (not sure this is possible if the import of sklearn fails). Either rebuild the version of sklearn you are trying to import now (how to do that depends on how it was installed, most likely python setup.py build_ext -i or python setup.py install) or remove it from the search path.