unable to use TfidfVectorizer in Google app engine

2019-02-27 12:40发布

I am writing a python program in Google App Engine that calculates tf-idf using TfidfVectorizer in sklearn.

I have added sklearn library and have the import as:

from sklearn.feature_extraction.text import TfidfVectorizer

However it gives me no module named _check_build although it is in the library that I have imported.

Note: I have the same code in pure python and it works just fine so there is nothing wrong with the python syntax or imports; The problem starts with GAE.

Do you know any way to solve this issue?

2条回答
Viruses.
2楼-- · 2019-02-27 13:29

if you are not using any of GAE-specific tools, try deploying your app on Heroku. It let's you deploy a whole virtual environment with all the installed libraries on it. Specifically, Scikit-learn works on Heroku just fine. Check this Github repo for example.

查看更多
冷血范
3楼-- · 2019-02-27 13:31

You can't. sklearn has a lot of 'c' based dependencies and typically any module that is named with a leading _ is a binary module.

So that's why you are getting a no module named _check_build error.

I seriously doubt you will get it to run even if you fake some of the 'c' libs unless they have pure python analogues.

I have done this in the past where libs had 'c' based performance versions as well as pure python.

查看更多
登录 后发表回答