可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
I am trying to load my saved model from s3 using joblib
import pandas as pd
import numpy as np
import json
import subprocess
import sqlalchemy
from sklearn.externals import joblib
ENV = 'dev'
model_d2v = load_d2v('model_d2v_version_002', ENV)
def load_d2v(fname, env):
model_name = fname
if env == 'dev':
try:
model=joblib.load(model_name)
except:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
else:
s3_base_path='s3://sd-flikku/datalake/doc2vec_model'
path = s3_base_path+'/'+model_name
command = "aws s3 cp {} {}".format(path,model_name).split()
print('loading...'+model_name)
subprocess.call(command)
model=joblib.load(model_name)
return model
But i was facing this error:
from sklearn.externals import joblib
ImportError: cannot import name 'joblib' from 'sklearn.externals' (C:\Users\prane\AppData\Local\Programs\Python\Python37\lib\site-packages\sklearn\externals\__init__.py)
Then i tried installing joblib directly by doing
import joblib
but it gave me this error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<stdin>", line 8, in load_d2v_from_s3
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 585, in load
obj = _unpickle(fobj, filename, mmap_mode)
File "/home/ec2-user/.local/lib/python3.7/site-packages/joblib/numpy_pickle.py", line 504, in _unpickle
obj = unpickler.load()
File "/usr/lib64/python3.7/pickle.py", line 1088, in load
dispatch[key[0]](self)
File "/usr/lib64/python3.7/pickle.py", line 1376, in load_global
klass = self.find_class(module, name)
File "/usr/lib64/python3.7/pickle.py", line 1426, in find_class
__import__(module, level=0)
ModuleNotFoundError: No module named 'sklearn.externals.joblib'
Can you tell me how to solve this? Thanks in advance
回答1:
It looks like your existing pickle save file (model_d2v_version_002
) encodes a reference module in a non-standard location – a joblib
that's in sklearn.externals.joblib
rather than at top-level.
The current scikit-learn
documentation only talks about a top-level joblib
– eg in 3.4.1 Persistence example – but I do see a reference in someone else's old issue to a DeprecationWarning in scikit-learn
version 0.21 about an older scikit.external.joblib
variant going away:
Python37\lib\site-packages\sklearn\externals\joblib_init_.py:15:
DeprecationWarning: sklearn.externals.joblib is deprecated in 0.21 and
will be removed in 0.23. Please import this functionality directly
from joblib, which can be installed with: pip install joblib. If this
warning is raised when loading pickled models, you may need to
re-serialize those models with scikit-learn 0.21+.
'Deprecation' means marking something as inadvisable to rely-upon, as it is likely to be discontinued in a future release (often, but not always, with a recommended newer way to do the same thing).
I suspect your model_d2v_version_002
file was saved from an older version of scikit-learn
, and you're now using scikit-learn
(aka sklearn
) version 0.23+ which has totally removed the sklearn.external.joblib
variation. Thus your file can't be directly or easily loaded to your current environment.
But, per the DeprecationWarning
, you can probably temporarily use an older scikit-learn
version to load the file the old way once, then re-save it with the now-preferred way. Given the warning info, this would probably require scikit-learn
version 0.21.x or 0.22.x, but if you know exactly which version your model_d2v_version_002
file was saved from, I'd try to use that. The steps would roughly be:
import sklearn.external.joblib as extjoblib
import joblib
extjoblib.load()
your old file as you'd planned, but then immediately re-joblib.dump()
the file using the top-level joblib
. (You likely want to use a distinct name, to keep the older file around, just in case.)
move/update to your real, modern environment, and only import joblib
(top level) to use joblib.load()
- no longer having any references to `sklearn.external.joblib' in either your code, or your stored pickle files.
回答2:
You can directly use
import joblib
instead of
from sklearn.externals import joblib.
I tried this out and it worked well for me
回答3:
You can import joblib
directly by installing it as a dependency and using import joblib
,
https://joblib.readthedocs.io/en/latest/
回答4:
Maybe your code is outdated. For anyone who aims to use fetch_mldata
in digit handwritten project, you should fetch_openml
instead. (link)
In old version of sklearn:
from sklearn.externals import joblib
mnist = fetch_mldata('MNIST original')
In sklearn 0.23 (stable release):
import sklearn.externals
import joblib
dataset = datasets.fetch_openml("mnist_784")
features = np.array(dataset.data, 'int16')
labels = np.array(dataset.target, 'int')
For more info about deprecating fetch_mldata
see scikit-learn doc
回答5:
In case the execution / call to joblib is within another .py program instead of your own (in such case even you have installed joblib, it still causes error from within the calling python programme unless you change the code, i thought would be messy), I tried to create a hardlink:
(windows version)
Python> import joblib
then inside your sklearn path >......\Lib\site-packages\sklearn\externals
mklink /J ./joblib .....\Lib\site-packages\joblib
(you can work out the above using a ! or %, !mklink....... or %mklink...... inside your Python juptyter notebook , or use python OS command...)
This effectively create a virtual folder of joblib within the "externals" folder
Remarks:
Of course to be more version resilient, your code has to check for the version of sklearn is >= 0.23 again before hand.
This would be alternative to changing sklearn vesrion.
回答6:
you can directly import joblib instead of from sklearn.externals import joblib...
its works