I want to train a LogisticRegression
and a RandomForestClassifier
and combine their scores using a GaussianNB
:
from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB
X, y = make_classification(n_samples=1000, n_features=4,
n_informative=2, n_redundant=0,
random_state=0, shuffle=False)
logit = LogisticRegression(random_state=0)
logit.fit(X, y)
randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
randf.fit(X, y)
X1 = np.transpose([logit.predict_proba(X)[:,0], randf.predict_proba(X)[:,0]])
nb = GaussianNB()
nb.fit(X1, y)
How do I do this with Pipeline so that I can pass it to cross_validate
and GridSearchCV
?
PS. I suppose I can define my own class implementing the fit
and predict_proba
methods, but I thought that there should be a standard way to do it...
No, there is nothing inbuilt in sklearn to do what you want without writing some custom code. You can parallelize some parts of your code by using
FeatureUnion
, and sequence the whole task usingPipeline
but you need to write custom transformers which can forward the output ofpredict_proba
totransform
method.Something like this:
Now you can simply call
pipe.predict()
and all the things will be correctly done.For more information about FeatureUnion, you can look at my other answer here to a similar question:-