I have extracted words from a set of URLs and calculated cosine similarity between each URL's contents.And also I have normalized the values between 0-1(using Min-Max).Now i need to cluster the URLs based on cosine similarity values to find out similar URLs.which clustering algorithm will be most suitable?.Please suggest me a Dynamic clustering method because it will be useful since i could increase number of URL's on demand and also it will be more natural.Please correct me if you feel i'm making the progress in a wrong way.Thanks in anticipation.
可以将文章内容翻译成中文,广告屏蔽插件可能会导致该功能失效(如失效,请关闭广告屏蔽插件后再试):
问题:
回答1:
K-means clustering can be used for online learning, you just need to select the number of clusters a priori. Also, I think you shouldn't normalize your data, because cosine already provides values in the range [0:1]. Your Min-Max normalization could lead to information loss.