Given a set of points s (a set of x,y coordinates) and a path that is made up of line segments joining a set of points l, describe an efficient algorithm that can be used to find a subset of points from s that are within the specified distance d of the path l.
A practical application of this may be to find a list of restaurants within 10 miles anywhere along a road trip path between cites.
For example, in the following diagram, the points in green would be included in the search results.
Solutions would be preferred in C#, but bonus points may be given for a SQL based approach :-)
Given general computing tools, your best algorithm is going to be some variation on filtering out obviously uninteresting points and finding the distance from every line segment to each remaining point. (The polygon solution suggested is wrong -- the area of interest is the union of that polygon with the circle of radius d around each point on l -- and actually less efficient than simply finding the distance from each point to every line segment.)
Which filters are best will depend on the nature of your data -- for instance, in the sample diagram, filtering on l's bounding box (plus d) will be very useful.
An interesting filter would be: given point p defining l, take a circle of radius r, where r is the maximum length of the two segments defined in part by p plus d. Only points within this circle can be close enough to those two segments to be in our solution set, so we can quickly determine whether or not we can skip those two line segment distance calculations. (This will be less efficient if some line segments are very long, but if they are, those line segments can be easily broken up into smaller chunks.)
I believe these two classes will answer your question. I built the GetArea() function using Heron's Formula. Make sure that the Segment Points are always passed first to the IsPointWithinDistanceToLineSegment and the TestPoint is always passed 3rd.
EDIT: I stupidly used Point which only allows integers for X and Y. You'll need to fix this with another class that takes doubles or floats as X and Y...
Could you use a quad-tree to partition the space up into segments, and then only for points in the segments close to your path?
If you want to do at least some of the work in SQL, you can compute a bounding box for the path, then incorporate into your query the condition that the location is inside the bounding box. You run one of the other algorithms against just the returned rows.
This at least prevents you from having to download the entire data base for each path.
I also thought about this some time ago. I think, efficient is misleading. Just testing all line segments for each point is good enough. It is very cheap to calculate the distance. If there are many points, you can also think about refining the strategy which points to choose using a level-set approach. i.e.
that's maybe not complete, but should be fast and avoids checking points far away and quite ok.
The only solution to this is along the lines of:
The inner loop can be terminated early once a point that lies within the constraint is found. Note that the inner and outer loops can be transposed.
The question then becomes one of determining if a point is within the constraint. mbeckish suggests using a simple rectangle test, where the rectangle is formed by extruding along the line's perpendicular, but this will fail for points near the end points but outside this rectangle. Extruding the rectangle along the direction of the line as well will also fail since the points near the end should really use a point in circle test:
where the * is inside the expanded rectangle but outside the rounded end cap which would be a more strict test.
Now, the distance test might not be a 'as the crow flies' test but a graph search, for example, points within x miles of a road using only roads to connect them together:
In the above diagram, the target is within the search area but to get to the target along the available roads would be greater than the allowed distance.
However you choose to implement the test, the basic loop in a loop algorithm will be required.
Ways to check the constraint where the constraint is an 'as the crow flies' constraint:
Geometrically: First, determine the distance from the point P to the line. Then, if the point is within the constraint project the point P onto the line segment, where the line is defined as:
where P1 and P2 are the end points and n is the parametric variable. If the value of n for the projected P is in the range 0 <= n <= 1 then the point is between P1 and P2. Finally, do a point in circle test for circles centred on P1 and P2.
Transformations: Create a transformation matrix for each line segment such that P1 is transformed to the origin and P2 is transformed to (|P1-P2|, 0). Then apply each transform to all points and then test each point in the rectangle (0, -constraint) - (|P1-P2|, constraint). This method can be highly optimised using SIMD or a GPU
Graphically: Draw the line segments to a bitmap using a pen with rounded end caps and a width proportional to the constraint distance. Then, for each test point, check the pixel in the bitmap corresponding to the point. This is not accurate (but bigger bitmaps create more accurate results but need more memory) but is pretty quick once the bitmap is created.
If the constraint is defined by the route along a graph it becomes more complex. You need to look at breadth first searches where the start points are the end of each line segment and the end point is the potential target. If a line segment has junctions along its length, then break the line segment into segments without junctions.