Obtain the Clustered Documents of DBSCAN

2019-09-21 23:14发布

I attempted to use DBSCAN (from scikit-learn) to cluster text documents. I use TF-IDF (TfidfVectorizer in sklearn) to create the feature of each document.

However, I have not found a way to obtain (print) the documents that are clustered by DBSCAN.

The DBSCAN in sklearn, provides an attribute called 'labels_' which allows us to get the cluster group labels (e.g. 1, 2, 3, -1 for noise). But, I want to get the documents that are clustered by DBSCAN, instead of the cluster group labels.

To emphasize, I want to know what documents that belong to each cluster. Could you please suggest ways to do this?

Thank you very much!

标签： machine-learning scikit-learn hierarchical-clustering dbscan

1条回答

女痞

2楼-- · 2019-09-22 00:09

Use the labels to select documents.

X[labels_ == 1,:]

Should be all documents in cluster 1.

0人赞添加讨论(0) 举报

Obtain the Clustered Documents of DBSCAN

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间