I am using sel = SelectFromModel(ExtraTreesClassifier(10), threshold='mean')
to select the most important features in my data set.
Then I want to feed these selected features to my keras classifier. But my keras based Neural Network classifier needs the number of imprtant features selected in the first step. Below is the code for my keras classifier and the variable X_new
is the numpy array of new features selected.
The code for keras classifier is as under.
def create_model(
dropout=0.2):
n_x_new=X_new.shape[1]
np.random.seed(6000)
model_new = Sequential()
model_new.add(Dense(n_x_new, input_dim=n_x_new, kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.add(Dense(10, kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.add(Dropout(0.2))
model_new.add(Dense(1,kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.compile(loss='binary_crossentropy',optimizer='adam', metrics=['binary_crossentropy'])
return model_new
seed = 7
np.random.seed(seed)
clf=KerasClassifier(build_fn=create_model, epochs=10, batch_size=1000, verbose=0)
param_grid = {'clf__dropout':[0.1,0.2]}
model = Pipeline([('sel', sel),('clf', clf),])
grid = GridSearchCV(estimator=model, param_grid=param_grid,scoring='roc_auc', n_jobs=1)
grid_result = grid.fit(np.concatenate((train_x_upsampled, cross_val_x_upsampled), axis=0), np.concatenate((train_y_upsampled, cross_val_y_upsampled), axis=0))
As I am using Pipline with grid search, I don't understand how my neural network will get the important features selected in the first step. I want to get those important features selected into an array of X_new
.
Do I need to implement a custom estimator in between sel
and keras model
?
If yes, How would I implement one? I know the generic code for custom estimator but I am unable to mold it according to my requirement. The generic code is as under.
class new_features(TransformerMixin):
def transform(self, X):
X_new = sel.transform(X)
return X_new
But this is not working. Is there any way I can solve this problem without using custom estimator in between?