How to get feature names from output of gridSearch

2019-07-26 23:02发布

问题:

I implemented PCA with Naive Bayes using sklearn and I optimized the PCA number of components using GridSearchCV.

I tried to figure out the feature names of the best estimator but I was not able to. Here's the code that I have tried.

from sklearn.cross_validation import train_test_split 
features_train, features_test, labels_train, labels_test = \
train_test_split(features, labels, test_size=0.3, random_state=42)
### A Naive Bayes classifier combined with PCA is used and its accuracy is tested 

pca = decomposition.PCA()
#clf = GaussianNB()
clf = Pipeline(steps=[('pca', pca), ('gaussian_NB', GaussianNB())])
n_components = [3, 5, 7, 9]
clf = GridSearchCV(clf,
                         dict(pca__n_components=n_components))

# from sklearn.tree import DecisionTreeClassifier
#clf = DecisionTreeClassifier(random_state=0, min_samples_split=20)
clf = clf.fit(features_train, labels_train)
features_pred = clf.predict(features_test) 
print "The number of components of the best estimator is ", clf.best_estimator_.named_steps['pca'].n_components
print "The best parameters:", clf.best_params_
#print "The best estimator", clf.best_estimator_.get_params(deep=True).gaussian_NB
# best_est = RFE(clf.best_estimator_)
# print "The best estimator:", best_est
estimator = clf.best_estimator_
print "The features are:", estimator['features'].get_feature_names()

回答1:

You seem to be confusing dimensionality reduction and features selection. PCA is dimensionality reduction technique, it does not select features, it looks for a lower dimensional linear projection. Your resulting features are not your original ones - they are linear combinations of those. Thus if your original features were "width", "height" and "age" after PCA to dim 2 you end up with features like "0.4 * width + 0.1 * height - 0.05 * age" and "0.3 * height - 0.2 * width".



回答2:

It seems like this answer might be what you're after. It's contains a really good and exhaustive example, too!