Which clustering algorithm is suitable for one-dim

2019-05-27 04:32发布

问题:

I have a one dimensional List like this

public class Zeit_und_Eigenschaft
{
    [Feature]
    public double Sekunden { get; set; }
}

//...
List<Zeit_und_Eigenschaft> lzue = new List<Zeit_und_Eigenschaft>();
//fill lzue

lzue can be

lzue.Sekunden
1
2
3
4
8
9
10
22
55
...

Goal is to find clusters in that list, ie elements that could form groups like f.i. in this example

lzue.Sekunden
1
2
3
4

8
9
10

22

55

Which clustering algorithm is suitable(I don't know the number of clusters k)? GMM? PCA? Kmeans? Other?

回答1:

Don't look for clustering algorithms.

Clustering is a good term for multivariate data, but your data is one-dimensional, so you should look at much older statistics literature. E.g. Natural Breaks optimization.

Or just kernel density estimation. In fact, you will find the very same question dozens of times here on stackoverflow already...

1D Number Array Clustering

Cluster one-dimensional data optimally?

partitioning an float array into similar segments (clustering)

Efficiently grouping similar numbers together

Clustering values by their proximity in python (machine learning?)



回答2:

There was a good article in MSDN magazine on this topic a few months ago. They used the k-means algorithm. Link:

http://msdn.microsoft.com/en-us/magazine/jj891054.aspx

Also, there are some videos on k-means clustering as part of Andrew Ng's online machine learning class. Link:

https://class.coursera.org/ml-003/lecture/preview

When you don't know k, there are some algorithms to search for a good value. Do a web search for k-means + elbow.