Using ELKI's Distance Function

2020-04-22 04:50发布

This is a follow up from a previous question, where we commented that using euclidian distances with lat,long coordinates does not yeld correct results. I read in the documentation that ELKI enables geographic data, namely int its distance function, present in the various clustering algorithms. In the user interface of ELKI, I can see there are options to replace the default distance function (euclidian) by a better suited one. I also see that in that case, you need to provide a datum, which makes sense, since you have to tell ELKI how the data is projected. My options in the UI are to use "geo.LngLatDistanceFunction", since I am using (x,y) coordinates and to use "WGS84SpheroidEarthModel", since the data is in epsg:4326. I am trying to parametrize accordingly my algorithm in Java, but I am not sure how to do it: If I initialize my parameters like this:

ListParameterization params2 = new ListParameterization();
    params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.MINPTS_ID, minPoints);
params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.clustering.DBSCAN.Parameterizer.EPSILON_ID, epsilon);

Could I set the distance function like this?

params2.addParameter(de.lmu.ifi.dbs.elki.algorithm.DistanceBasedAlgorithm.DISTANCE_FUNCTION_ID, 
            de.lmu.ifi.dbs.elki.distance.distancefunction.geo.LngLatDistanceFunction.class);

What about the geo.model? (I have no clue about this)

1条回答
霸刀☆藐视天下
2楼-- · 2020-04-22 05:18

The default earth model is SphericalVincentyEarthModel, which is supposedly a bit faster (but assumes a spherical earth, instead of a spheroid); but this should not make much of a difference unless you need precision to the meter: the maximum error should be 0.3% of the distance, according to this answer.

To set the earth model parameter, use EarthModel.MODEL_ID as option ID. (As referenced by the Parameterizer of LngLatDistanceFunction). When trying to find the appropriate option ID, always have a look at the Parameterizers - we are slowly moving all the option IDs into the Parameterizers.

查看更多
登录 后发表回答