Efficient Geodesic Nearest Neighbors

2019-03-03 08:34发布

Starting with latitude/longitude data (in radians), I’m trying to efficiently find the nearest n neighbors, ideally with geodesic (WGS-84) distance.

Right now I’m using sklearn’s BallTree with haversine distance (KD-Tres only take minkowskian distance), which is nice and fast (3-4 seconds to find nearest 5 neighbors for 1200 locations in 7500 possible matches), but not as accurate as I need. Code:

tree = BallTree(possible_matches[['x', 'y']], leaf_size=2, metric='haversine')
distances, indices = tree.query(locations[['x', 'y']], k=5)

When I substitute in a custom function for metric (metric=lambda u, v: geopy.distance.geodesic(u, v).miles) it takes an "unreasonably" long time (4 minutes in the same case as above). It’s documented that custom functions can take a long time, but doesn't help me solve my problem.

I looked at using a KD-Tree with ECEF coordinates and euclidian distance, but I’m not sure if that’s actually any more accurate.

How can I keep the speed of my current method, but improve my distance accuracy?

标签： python scipy scikit-learn geospatial

1条回答

家丑人穷心不美

2楼-- · 2019-03-03 09:09

The main reason for why your metric is slow is that it written in Python while other metrics in sklearn are written in Cython/C++/C.

So as for instance discussed here for Random Forests or here you would have to implement your metric in Cython, fork your own version of BallTree and include your custom metric there.

0人赞添加讨论(0) 举报

Efficient Geodesic Nearest Neighbors

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间