The Problem
Imagine I am stood in an airport. Given a geographic coordinate pair, how can one efficiently determine which airport I am stood in?
Inputs
- A coordinate pair
(x,y)
representing the location I am stood at. - A set of coordinate pairs
[(a1,b1), (a2,b2)...]
where each coordinate pair represents one airport.
Desired Output
A coordinate pair (a,b)
from the set of airport coordinate pairs representing the closest airport to the point (x,y)
.
Inefficient Solution
Here is my inefficient attempt at solving this problem. It is clearly linear in the length of the set of airports.
shortest_distance = None
shortest_distance_coordinates = None
point = (50.776435, -0.146834)
for airport in airports:
distance = compute_distance(point, airport)
if distance < shortest_distance or shortest_distance is None:
shortest_distance = distance
shortest_distance_coordinates = airport
The Question
How can this solution be improved? This might involve some way of pre-filtering the list of airports based on the coordinates of the location we are currently stood at, or sorting them in a certain order beforehand.
Using a k-dimensional tree:
Where 1.41421356 is the distance between the queried point and the nearest neighbour and 1 is the index of the neighbour.
See: http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.KDTree.query.html#scipy.spatial.KDTree.query
If your coordinates are unsorted, your search can only be improved slightly assuming it is
(latitude,longitude)
by filtering on latitude first as for earthbut that would not give a huge speedup.
If you sort the airports by latitude first then you can use a binary search for finding the first airport that could match (
airport_lat >= point_lat-tolerance
) and then only compare up to the last one that could match (airport_lat <= point_lat+tolerance
) - but take care of 0 degrees equaling 360. While you cannot use that library directly, the sources of bisect are a good start for implementing a binary search.While technically this way the search is still O(n), you have much fewer actual distance calculations (depending on tolerance) and few latitude comparisons. So you will have a huge speedup.
From this SO question:
where
node
is a tuple with two values (x, y) andnodes
is an array of tuples with two values ([(x_1, y_1), (x_2, y_2),]
)