I have a text corpus that contains 1000+ articles each in a separate line. I am trying to use Hierarchy Clustering using Scipy in python to produce clusters of related articles. This is the code I used to do the clustering
# Agglomerative Clustering
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as hac
tree = hac.linkage(X.toarray(), method="complete",metric="euclidean")
plt.clf()
hac.dendrogram(tree)
plt.show()
and I got this plot
Then I cut off the tree at the third level with fcluster()
from scipy.cluster.hierarchy import fcluster
clustering = fcluster(tree,3,'maxclust')
print(clustering)
and I got this output: [2 2 2 ..., 2 2 2]
My question is how can I find the top 10 frequent words in each cluster in order to suggest a topic for each cluster?