Find the closest coordinate from a set of coordina

2020-07-22 18:24发布

I have about 1000 set of geographical coordinates (lat, long). Given one coordinate i want to find the closest one from that set. My approach was to measure the distance but on hundreds requests per second can be a little rough to the server doing all that math.

What is the best optimized solution for this?

Thanks

4条回答
三岁会撩人
2楼-- · 2020-07-22 18:45

You will want to use the 'Nearest Neighbor Algorithm'.

查看更多
forever°为你锁心
3楼-- · 2020-07-22 18:51

You can use this library sphere-knn, or look at something like PostGIS.

查看更多
乱世女痞
4楼-- · 2020-07-22 18:53

Why not select the potential closest points from the set (eg set a threshold, say, 0.1 and filter the set so that you have any points with +-0.1 in both axes from your target point). Then do that actual calcs on this set.

If none are within the first range, just enlarge it (0,2) and repeat (0.3, 0.4...) until you've got a match. Obviously you would tune the threshold so it best matched your likely results.

(I'm assuming the time-consulming bit is the actual distance calculation, so the idea is to limit the number of calculations.)

查看更多
女痞
5楼-- · 2020-07-22 19:01

An Algorithmic Response

Your approach is already O(n) in time. It's algorithmically very fast, and fairly simple to implement.

If that is not enough, you should consider taking a look at R-trees. The idea behind R-trees is roughly paraphrased as follows:

  1. You already have a set of n elements. You can preprocess this data to form rough 'squares' of regions each containing a set of points, with an established boundary.
  2. Now say a new element comes in. Instead of comparing across every coordinate, you identify which 'square' it belongs in by just comparing whether the point is smaller than the boundaries, and then measure the distance with only the points inside that square.

You can see at once the benefits:

  • You are no longer comparing against all coordinates, but instead only the boundaries (strictly less than the number of all elements) and then against the number of coordinates within your chosen boundary (also less than the number of all elements).
  • The upper bound of such an algorithm is O(n) time. The lower bound may, on average, be O(log n).

The main improvement is mostly in the pre-processing step (which is 'free' in that it's a one-time cost) and in the reduced number of comparisons needed.

A Systemic Response

Just buy another server, and distribute the requests and the elements using a load balancer such as Haproxy.

Servers are fairly cheap, especially if they are critical to your business, and if you want to be fast, it's an easy way to scale.

查看更多
登录 后发表回答