Algorithm for clustering people with similar inter

2019-04-12 15:11发布

问题:

I want to cluster people into groups based on their interests. For eg. people who like machine learning and graphs may be placed in a group and people who have interest in mathematics and economics etc. may be placed in a different group.

The algorithm should be able to decide which people have most matching interests based on the interests of the people and create clusters.It should also be able to output about other persons in the group in which a particular person is placed.

回答1:

This does not sound like a particularly difficult clustering problem, and any of the off-the-shelf clustering algorithm will probably work well. If you know how many clusters you want, then try k-means or k-medoid clustering. If you don't know how many clusters, then try agglomerative clustering.

The difficult part of the problem will be the features. You mentioned that 'interests' could be used as the features upon which to cluster, but feature engineering and selection will always involve some trial and error.



回答2:

Without more context of your problem, I can't really give a definite answer. Most clustering algorithms will work though, the problem is how "good" are your results. I'm quoting the word "good" because you'll need some sort of metric to measure that (generally inter-cluster and intra-cluster distance).

Here's the advice given to me when I was taught on how to decide on an algorithm for data mining: Try the simplest algorithms first - quite often these are overlooked but perform quite well (Naive Bayes for supervised learning is a classic example).

To start you off, try something like K-means which is a simple and popular method, you can find more info here http://en.wikipedia.org/wiki/K-means_clustering (if you look at the Software section you can also find a list of implementations that you could try).

The second part of the criteria is to be able to output the other people in the group based on a target person. This is doable in all clustering algorithms since you'll have X subsets of people, you simply need to find the subset which the target person is in and then iterate that subset and print all the people within out.



回答3:

I think the right approach will be Kmeans clustering. The most important part of your problem is feature selection.

Try with some features that you think are most important and simply apply kmeans in some statistical programing language like R, inspect the result and improve it by feature modification or selecting more appropriate features. Hit and trial can give you insight if you are not sure about feature selection.

If you can provide some sample data, it will help to give some specific solutions to your problem.



回答4:

Its coming a bit late, but there's actually an app in the windows store that is doing exactly that : finding profiles having similar characteristics its called k-modo