How to compose sklearn estimators using another es

I want to train a LogisticRegression and a RandomForestClassifier and combine their scores using a GaussianNB:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

logit = LogisticRegression(random_state=0)
logit.fit(X, y)

randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
randf.fit(X, y)

X1 = np.transpose([logit.predict_proba(X)[:,0], randf.predict_proba(X)[:,0]])

nb = GaussianNB()
nb.fit(X1, y)

How do I do this with Pipeline so that I can pass it to cross_validate and GridSearchCV?

PS. I suppose I can define my own class implementing the fit and predict_proba methods, but I thought that there should be a standard way to do it...

标签： machine-learning scikit-learn

1条回答

迷人小祖宗

2楼-- · 2019-08-26 10:42

No, there is nothing inbuilt in sklearn to do what you want without writing some custom code. You can parallelize some parts of your code by using FeatureUnion, and sequence the whole task using Pipeline but you need to write custom transformers which can forward the output of predict_proba to transform method.

Something like this:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.pipeline import Pipeline, FeatureUnion

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

# This is the custom transformer that will convert 
# predict_proba() to pipeline friendly transform()
class PredictProbaTransformer(BaseEstimator, TransformerMixin):
    def __init__(self, clf=None):
        self.clf = clf

    def fit(self, X, y):
        if self.clf is not None:
            self.clf.fit(X, y)

        return self

    def transform(self, X):

        if self.clf is not None:
            # Drop the 2nd column but keep 2d shape
            # because FeatureUnion wants that 
            return self.clf.predict_proba(X)[:,[0]]

        return X

    # This method is important for correct working of pipeline
    def fit_transform(self, X, y):
        return self.fit(X, y).transform(X)

logit = LogisticRegression(random_state=0)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)

pipe = Pipeline([
                 ('stack',FeatureUnion([
                              ('logit', PredictProbaTransformer(logit)),
                              ('randf', PredictProbaTransformer(randf)),
                              #You can add more classifiers with custom wrapper like above
                                       ])),
                 ('nb',GaussianNB())])

pipe.fit(X, y)

Now you can simply call pipe.predict() and all the things will be correctly done.

For more information about FeatureUnion, you can look at my other answer here to a similar question:-

Use predicted probability of one model to train another model and save as one single model

0人赞添加讨论(0) 举报

How to compose sklearn estimators using another es

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间