Getting names and number of selected features befo

2019-07-22 16:52发布

问题:

I am using sel = SelectFromModel(ExtraTreesClassifier(10), threshold='mean') to select the most important features in my data set.

Then I want to feed these selected features to my keras classifier. But my keras based Neural Network classifier needs the number of imprtant features selected in the first step. Below is the code for my keras classifier and the variable X_new is the numpy array of new features selected.

The code for keras classifier is as under.

def create_model( 
             dropout=0.2):

n_x_new=X_new.shape[1]
np.random.seed(6000)
model_new = Sequential()
model_new.add(Dense(n_x_new, input_dim=n_x_new, kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.add(Dense(10, kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.add(Dropout(0.2))
model_new.add(Dense(1,kernel_initializer='glorot_uniform', activation='sigmoid'))
model_new.compile(loss='binary_crossentropy',optimizer='adam', metrics=['binary_crossentropy'])

return model_new

seed = 7
np.random.seed(seed) 

clf=KerasClassifier(build_fn=create_model, epochs=10, batch_size=1000, verbose=0)


param_grid = {'clf__dropout':[0.1,0.2]}
model = Pipeline([('sel', sel),('clf', clf),])


grid = GridSearchCV(estimator=model, param_grid=param_grid,scoring='roc_auc', n_jobs=1)
grid_result = grid.fit(np.concatenate((train_x_upsampled, cross_val_x_upsampled), axis=0), np.concatenate((train_y_upsampled, cross_val_y_upsampled), axis=0))

As I am using Pipline with grid search, I don't understand how my neural network will get the important features selected in the first step. I want to get those important features selected into an array of X_new.

Do I need to implement a custom estimator in between sel and keras model?

If yes, How would I implement one? I know the generic code for custom estimator but I am unable to mold it according to my requirement. The generic code is as under.

class new_features(TransformerMixin):
def transform(self, X):
    X_new = sel.transform(X)
    return X_new

But this is not working. Is there any way I can solve this problem without using custom estimator in between?