I am writing a python program in Google App Engine that calculates tf-idf using TfidfVectorizer in sklearn.
I have added sklearn library and have the import as:
from sklearn.feature_extraction.text import TfidfVectorizer
However it gives me no module named _check_build
although it is in the library that I have imported.
Note: I have the same code in pure python and it works just fine so there is nothing wrong with the python syntax or imports; The problem starts with GAE.
Do you know any way to solve this issue?
if you are not using any of GAE-specific tools, try deploying your app on Heroku. It let's you deploy a whole virtual environment with all the installed libraries on it. Specifically, Scikit-learn works on Heroku just fine. Check this Github repo for example.
You can't. sklearn has a lot of 'c' based dependencies and typically any module that is named with a leading
_
is a binary module.So that's why you are getting a no
module named _check_build
error.I seriously doubt you will get it to run even if you fake some of the 'c' libs unless they have pure python analogues.
I have done this in the past where libs had 'c' based performance versions as well as pure python.