Below is my pipeline and it seems that I can't pass the parameters to my models by using the ModelTransformer class, which I take it from the link (http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
The error message makes sense to me, but I don't know how to fix this. Any idea how to fix this? Thanks.
# define a pipeline
pipeline = Pipeline([
('vect', DictVectorizer(sparse=False)),
('scale', preprocessing.MinMaxScaler()),
('ess', FeatureUnion(n_jobs=-1,
transformer_list=[
('rfc', ModelTransformer(RandomForestClassifier(n_jobs=-1, random_state=1, n_estimators=100))),
('svc', ModelTransformer(SVC(random_state=1))),],
transformer_weights=None)),
('es', EnsembleClassifier1()),
])
# define the parameters for the pipeline
parameters = {
'ess__rfc__n_estimators': (100, 200),
}
# ModelTransformer class. It takes it from the link
(http://zacstewart.com/2014/08/05/pipelines-of-featureunions-of-pipelines.html)
class ModelTransformer(TransformerMixin):
def __init__(self, model):
self.model = model
def fit(self, *args, **kwargs):
self.model.fit(*args, **kwargs)
return self
def transform(self, X, **transform_params):
return DataFrame(self.model.predict(X))
grid_search = GridSearchCV(pipeline, parameters, n_jobs=-1, verbose=1, refit=True)
Error Message: ValueError: Invalid parameter n_estimators for estimator ModelTransformer.
GridSearchCV
has a special naming convention for nested objects. In your caseess__rfc__n_estimators
stands foress.rfc.n_estimators
, and, according to the definition of thepipeline
, it points to the propertyn_estimators
ofObviously,
ModelTransformer
instances don't have such property.The fix is easy: in order to access underlying object of
ModelTransformer
one needs to usemodel
field. So, grid parameters becomeP.S. it's not the only problem with your code. In order to use multiple jobs in GridSearchCV, you need to make all objects you're using copy-able. This is achieved by implementing methods
get_params
andset_params
, you can borrow them fromBaseEstimator
mixin.