I attempted to use DBSCAN (from scikit-learn) to cluster text documents. I use TF-IDF (TfidfVectorizer in sklearn) to create the feature of each document.
However, I have not found a way to obtain (print) the documents that are clustered by DBSCAN.
The DBSCAN in sklearn, provides an attribute called 'labels_' which allows us to get the cluster group labels (e.g. 1, 2, 3, -1 for noise). But, I want to get the documents that are clustered by DBSCAN, instead of the cluster group labels.
To emphasize, I want to know what documents that belong to each cluster. Could you please suggest ways to do this?
Thank you very much!