Clustering algorithm with different epsilons on di

2019-07-09 02:11发布

问题:

I am looking for a clustering algorithm such a s DBSCAN do deal with 3d data, in which is possible to set different epsilons depending on the axis. So for instance an epsilon of 10m on the x-y plan, and an epsilon 0.2m on the z axis.

Essentially, I am looking for large but flat clusters.

Note: I am an archaeologist, the algorithm will be used to look for potential correlations between objects scattered in large surfaces, but in narrow vertical layers

回答1:

Solution 1:

Scale your data set to match your desired epsilon.

In your case, scale z by 50.

Solution 2:

Use a weighted distance function.

E.g. WeightedEuclideanDistanceFunction in ELKI, and choose your weights accordingly, e.g. -distance.weights 1,1,50 will put 50x as much weight on the third axis.

This may be the most convenient option, since you are already using ELKI.



回答2:

Just define a custom distance metric when computing the DBSCAN core points. The standard DBSCAN uses the Euclidean distance to compute points within an epsilon. So all dimensions are treated the same.

However, you could use the Mahalanobis distance to weigh each dimension differently. You can use a diagonal covariance matrix for flat clusters. You can use a full symmetric covariance matrix for flat tilted clusters, etc.

In your case, you would use a covariance matrix like:

100  0    0   
  0  100  0   
  0    0  0.04

In the pseudo code provided at the Wikipedia entry for DBSCAN just use one of the distance metrics suggested above in the regionQuery function.

Update

Note: scaling the data is equivalent to using an appropriate metric.