特征选择后显示的功能名称(show feature names after feature sele

2019-07-17 19:14发布

我需要建立文本分类,现在我使用TfidfVectorizer和SelectKBest来选择功能,如下:

vectorizer = TfidfVectorizer(sublinear_tf = True, max_df = 0.5, stop_words = 'english',charset_error='strict')

X_train_features = vectorizer.fit_transform(data_train.data)
y_train_labels = data_train.target;

ch2 = SelectKBest(chi2, k = 1000)
X_train_features = ch2.fit_transform(X_train_features, y_train_labels)

我想打印出选择■最佳功能之后,选择功能名称(文本),有没有办法做到这一点? 我只需要打印出选定的功能名称,也许我应该用CountVectorizer呢?

Answer 1:

下面应该工作:

np.asarray(vectorizer.get_feature_names())[ch2.get_support()]


Answer 2:

为了扩大对@ ogrisel的回答,功能返回的列表是在同一个订单时,他们已经被量化。 下面的代码会给你按照降序排列(包括相应的p值一起)其志2得分进行排序排名靠前的功能列表:

top_ranked_features = sorted(enumerate(ch2.scores_),key=lambda x:x[1], reverse=True)[:1000]
top_ranked_features_indices = map(list,zip(*top_ranked_features))[0]
for feature_pvalue in zip(np.asarray(train_vectorizer.get_feature_names())[top_ranked_features_indices],ch2.pvalues_[top_ranked_features_indices]):
        print feature_pvalue


文章来源: show feature names after feature selection