I have about 1000 set of geographical coordinates (lat, long). Given one coordinate i want to find the closest one from that set. My approach was to measure the distance but on hundreds requests per second can be a little rough to the server doing all that math.
What is the best optimized solution for this?
Thanks
You will want to use the 'Nearest Neighbor Algorithm'.
You can use this library
sphere-knn
, or look at something likePostGIS
.Why not select the potential closest points from the set (eg set a threshold, say, 0.1 and filter the set so that you have any points with +-0.1 in both axes from your target point). Then do that actual calcs on this set.
If none are within the first range, just enlarge it (0,2) and repeat (0.3, 0.4...) until you've got a match. Obviously you would tune the threshold so it best matched your likely results.
(I'm assuming the time-consulming bit is the actual distance calculation, so the idea is to limit the number of calculations.)
An Algorithmic Response
Your approach is already
O(n)
in time. It's algorithmically very fast, and fairly simple to implement.If that is not enough, you should consider taking a look at R-trees. The idea behind R-trees is roughly paraphrased as follows:
You can see at once the benefits:
O(n)
time. The lower bound may, on average, beO(log n)
.The main improvement is mostly in the pre-processing step (which is 'free' in that it's a one-time cost) and in the reduced number of comparisons needed.
A Systemic Response
Just buy another server, and distribute the requests and the elements using a load balancer such as Haproxy.
Servers are fairly cheap, especially if they are critical to your business, and if you want to be fast, it's an easy way to scale.