所述sklearn.cross_decomposition.PLSSVD
在科幻试剂盒类学习看来是失败时的响应变量具有的形状(N,)
代替(N,1)
其中N
是数据集中样品的数目。
然而, sklearn.cross_validation.cross_val_score
当响应变量具有的形状失败(N,1)
而不是(N,)
我如何使用它们放在一起?
一个代码片段:
from sklearn.pipeline import Pipeline
from sklearn.cross_decomposition import PLSSVD
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
# x -> (N, 60) numpy array
# y -> (N, ) numpy array
# These are the classifier 'pieces' I'm using
plssvd = PLSSVD(n_components=5, scale=False)
logistic = LogisticRegression(penalty='l2', C=0.5)
scaler = StandardScaler(with_mean=True, with_std=True)
# Here's the pipeline that's failing
plsclf = Pipeline([('scaler', scaler),
('plssvd', plssvd),
('logistic', logistic)])
# Just to show how I'm using the pipeline for a working classifier
logclf = Pipeline([('scaler', scaler),
('logistic', logistic)])
##################################################################
# This works fine
log_scores = cross_validation.cross_val_score(logclf, x, y, scoring='accuracy',
verbose=True, cv=5, n_jobs=4)
# This fails!
pls_scores = cross_validation.cross_val_score(plsclf, x, y, scoring='accuracy',
verbose=True, cv=5, n_jobs=4)
具体而言,它未能在_center_scale_xy
的功能cross_decomposition/pls_.pyc
与'IndexError: tuple index out of range'
在线103: y_std = np.ones(Y.shape[1])
这是因为形状元组只有一个元件。
如果设置了scale=True
在PLSSVD
构造函数,它未能在管路99相同的功能: y_std[y_std == 0.0] = 1.0
,因为它正试图做一个布尔索引上的浮子( y_std
是浮动的,因为它只有一个尺寸)。
看来,像一个简单的办法,只要确保y
变量具有两个维度, (N,1)
然而:
如果我创建与尺寸的阵列(N,1)
出来的输出变量的y
,它仍然失败。 为了改变阵列,我跑步前添加此cross_val_score
:
y = np.transpose(np.array([y]))
然后,在失败sklearn/cross_validation.py
在行398:
File "my_secret_script.py", line 293, in model_create
scores = cross_validation.cross_val_score(plsclf, x, y, scoring='accuracy', verbose=True, cv=5, n_jobs=4)
File "/Users/my.secret.name/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1129, in cross_val_score
cv = _check_cv(cv, X, y, classifier=is_classifier(estimator))
File "/Users/my.secret.name/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 1216, in _check_cv
cv = StratifiedKFold(y, cv, indices=needs_indices)
File "/Users/my.secret.name/anaconda/lib/python2.7/site-packages/sklearn/cross_validation.py", line 398, in __init__
label_test_folds = test_folds[y == label]
ValueError: boolean index array should have 1 dimension
我在OSX,NumPy的版本运行此1.8.0
,SCI-Kit了解版本0.15-git
。
任何方式使用PLSSVD
连同cross_val_score
?