I attempted to use DBSCAN (from scikit-learn) to cluster text documents. I use TF-IDF (TfidfVectorizer in sklearn) to create the feature of each document.
However, I have not found a way to obtain (print) the documents that are clustered by DBSCAN.
The DBSCAN in sklearn, provides an attribute called 'labels_' which allows us to get the cluster group labels (e.g. 1, 2, 3, -1 for noise). But, I want to get the documents that are clustered by DBSCAN, instead of the cluster group labels.
To emphasize, I want to know what documents that belong to each cluster. Could you please suggest ways to do this?
Thank you very much!
Use the labels to select documents.
Should be all documents in cluster 1.