Retrieve list of training features names from clas

Is there a way to retrieve the list of feature names used for training of a classifier, once it has been trained with the fit method? I would like to get this information before applying to unseen data. The data used for training is a pandas DataFrame and in my case, the classifier is a RandomForestClassifier.

标签： python pandas scikit-learn random-forest

3条回答

Animai°情兽

2楼-- · 2020-03-24 05:10

Based on the documentation and previous experience, there is no way to get a list of the features considered at least at one of the splitting.

Is your concern that you do not want to use all your features for prediction, just the ones actually used for training? In this case I suggest to list the feature_importances_ after fitting and eliminate the features that does not seem relevant. Then train a new model with only the relevant features and use those features for prediction as well.

0人赞添加讨论(0) 举报

神经病院院长

3楼-- · 2020-03-24 05:13

I have a solution which works but is not very elegant. This is an old post with no existing solutions so I suppose there are not any.

Create and fit your model. For example

model = GradientBoostingRegressor(**params)
model.fit(X_train, y_train)

Then you can add an attribute which is the 'feature_names' since you know them at training time

model.feature_names = list(X_train.columns.values)

I typically then put the model into a binary file to pass it around but you can ignore this

joblib.dump(model, filename)
loaded_model = joblib.load(filename)

Then you can get the feature names back from the model to use them when you predict

f_names = loaded_model.feature_names
loaded_model.predict(X_pred[f_names])

0人赞添加讨论(0) 举报

该账号已被封号

4楼-- · 2020-03-24 05:23

You don't need to know which features were selected for the training. Just make sure to give, during the prediction step, to the fitted classifier the same features you used during the learning phase.

The Random Forest Classifier will only use the features on which it makes its splits. Those will be the same as those learnt during the first phase. Others won't be considered.

If the shape of your test data is not the same as the training data it will throw an error, even if the test data contains all the features used for the splits of you decision trees.

What's more, since Random Forests make random selection of features for your decision trees (called estimators in sklearn) all the features are likely to be used at least once.

However, if you want to know the features used, you can just call the attributes n_features_ and feature_importances_ on your classifier once fitted.

You can look here to see how you can retrieve the names of the most important features you used.

0人赞添加讨论(0) 举报

Retrieve list of training features names from clas

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间