Clustering from the cosine similarity values

2020-02-29 13:02发布

问题:

I have extracted words from a set of URLs and calculated cosine similarity between each URL's contents.And also I have normalized the values between 0-1(using Min-Max).Now i need to cluster the URLs based on cosine similarity values to find out similar URLs.which clustering algorithm will be most suitable?.Please suggest me a Dynamic clustering method because it will be useful since i could increase number of URL's on demand and also it will be more natural.Please correct me if you feel i'm making the progress in a wrong way.Thanks in anticipation.

回答1:

K-means clustering can be used for online learning, you just need to select the number of clusters a priori. Also, I think you shouldn't normalize your data, because cosine already provides values in the range [0:1]. Your Min-Max normalization could lead to information loss.