Given a set of ordered points, and a path made up of ordered lat,lon points that goes near those points (in lat/lon coordinates), I want to associate the points with the path, ideally with good algorithmic complexity (n*log(n)) or better, but maybe that might not be realistic.
The below diagram illustrates my question better. The blue line is the ordered path that is provided, and the red points are in the same order as the blue line. The green path is my desired result, which merges the red points and the blue line into a new ordered path.
Some threshold would have to be set for the distance of the red points from the blue path, let's assume that the red points are at most 50 meters from the blue path.
So, this is definitely the most mathmatical and unusual question I have asked on Stack Overflow. Any ideas would be great on solving this. I am planning to use it to merge GTFS shape data with trip data which describes stop times, and build it into the open source project, Depart App.
Thanks for your help!
Perhaps you could use a custom sweep line algorithm, this should reduce the complexity of finding the nearest line segment.
For each red point iterate through blue segments and find the best segment for it. What exactly "best" is depends on the task, but it looks like a good measure would be "how much longer path becomes". If A is a red point and BC is a segment, then the best segment will minimize this: f(A,BC)=(|BA|+|AC|-|BC|).
When the best segment is found, it should be replaced with BA and AC. In the same way other points should be processed.
Without optimizations this will give O(N^2).
If points are distributed more or less uniformly and segment length is significantly less than size of entire figure, some space partitioning may help.
Building on the other suggestions provided here, I think I have found a O(n) algorithm that works effectively.
The idea is to first pick the first red point as the starting point (or could choose the first blue point). Then compare the distance from this point to the next red point and the next blue point, pick the closer. Repeat until both lists are depleted. This seems to be quite effective on the Translink data set. I will give an update if I make tweaks to this idea.
It seems to me, you have two sets of points, the lat/lon points, and the red points. One of the lat/lon points is your starting point. Now considering all the other points as a set, use an (approximate?) nearest neighbor algorithm to find the next point. Now repeat. The only trouble is, that nearest neighbor algorithms tend to be O(n), which makes what you want to do O(n^2).