我希望培养一个LogisticRegression
和RandomForestClassifier
和使用结合自己的分数GaussianNB
:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
logit = LogisticRegression(random_state=0)
logit.fit(X, y)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
randf.fit(X, y)
X1 = np.transpose([logit.predict_proba(X)[:,0], randf.predict_proba(X)[:,0]])
nb = GaussianNB()
nb.fit(X1, y)
如何做到这一点与管道 ,这样我可以将它传递给cross_validate
和GridSearchCV
?
PS。 我想我可以定义自己的类实现fit
和predict_proba
方法,但我认为应该有这样做的标准方式...
没有,没有什么sklearn内置到你想要做什么,而无需编写一些自定义代码。 您可以通过使用并行代码的某些部分FeatureUnion
和序列采用全任务Pipeline
,但你需要写一个可以转发的输出定制变压器predict_proba
到transform
方法。
事情是这样的:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
# This is the custom transformer that will convert
# predict_proba() to pipeline friendly transform()
class PredictProbaTransformer(BaseEstimator, TransformerMixin):
def __init__(self, clf=None):
self.clf = clf
def fit(self, X, y):
if self.clf is not None:
self.clf.fit(X, y)
return self
def transform(self, X):
if self.clf is not None:
# Drop the 2nd column but keep 2d shape
# because FeatureUnion wants that
return self.clf.predict_proba(X)[:,[0]]
return X
# This method is important for correct working of pipeline
def fit_transform(self, X, y):
return self.fit(X, y).transform(X)
logit = LogisticRegression(random_state=0)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
pipe = Pipeline([
('stack',FeatureUnion([
('logit', PredictProbaTransformer(logit)),
('randf', PredictProbaTransformer(randf)),
#You can add more classifiers with custom wrapper like above
])),
('nb',GaussianNB())])
pipe.fit(X, y)
现在,你可以简单地调用pipe.predict()
所有的东西都会被正确地完成。
有关FeatureUnion的更多信息,你可以看看我的其他的答案在这里一个类似的问题: -
- 使用预测概率一个模型来训练另一个模型,并保存为一个单一的模式