I'm working with a large set of points represented by latitude/longitude pairs (the points are not necessarily unique, there could be several points in the set that are at the same location). The points are stored in a database.
What I need to do is figure out a way to efficiently perform a search to get the number of points that lie within a given radius (say, 25 miles) of an arbitrary point.
The count does not need to be 100% accurate - more importantly, it just has to be fast, and reasonably close to the correct count. Doing this with SQL is possible, by using a query with some trigonometry in the WHERE clause to filter points by their distance to a reference point. Unfortunately, this query is very, very expensive and caching is not likely to provide much help because the locations will be very spread out.
I'm ultimately looking to build some kind of in-memory structure that will be able to handle this kind of operation efficiently - trading off some of the accuracy and liveness of the data (maybe rebuilding it only once a day) in return for speed. I've been doing some research on kd-trees, but i'm not clear yet on how well this can be applied to latitude/longitude data (as opposed to x,y data in a 2d plane).
If anybody has any ideas or solutions I should look into, I'd really appreciate it - so thanks in advance.
I don't think you should use this solution. Having randomly thought about it a few days ago, I think that measuring the distance from a specific point the grid squares' locations will be based on circles rather than a uniformed grid. The further away from 0,0 you are the less accurate this will be!
What I did was to have 2 additional values on my PostalCode class. Whenever I update the Long/Lat on a PostalCode I calculate an X,Y distance from Long 0, Lat 0.
public static class MathExtender
{
public static double GetDistanceBetweenPoints(double sourceLatitude, double sourceLongitude, double destLatitude, double destLongitude)
{
double theta = sourceLongitude - destLongitude;
double distance =
Math.Sin(DegToRad(sourceLatitude))
* Math.Sin(DegToRad(destLatitude))
+ Math.Cos(DegToRad(sourceLatitude))
* Math.Cos(DegToRad(destLatitude))
* Math.Cos(DegToRad(theta));
distance = Math.Acos(distance);
distance = RadToDeg(distance);
distance = distance * 60 * 1.1515;
return (distance);
}
public static double DegToRad(double degrees)
{
return (degrees * Math.PI / 180.0);
}
public static double RadToDeg(double radians)
{
return (radians / Math.PI * 180.0);
}
}
Then I update my class like so:
private void CalculateGridReference()
{
GridReferenceX = MathExtender.GetDistanceBetweenPoints(0, 0, 0, Longitude);
GridReferenceY = MathExtender.GetDistanceBetweenPoints(0, 0, Latitude, 0);
}
So now I have an x,y grid distance (in miles) from grid reference 0,0 for each row in my DB. If I want to find all places with 5 miles of a long/lat I would first get the X,Y grid reference (say 25,75) then I would search 20..30, 70..80 in the DB, and further filter the results in memory using
MathExtensder.GetDistanceBetweenPoints(candidate.Lat, candidate.Long, search.Lat, search.Long) < TheRadiusOfInterest
The in DB part is ultra fast, and the in-memory part works on a smaller set to make it ultra accurate.
Use R-Trees
.
In Oracle, using Oracle Spatial, you can create an index:
CREATE INDEX ix_spatial ON spatial_table (locations) INDEXTYPE IS MDSYS.SPATIAL_INDEX;
that will create an R-Tree
for you and search over it.
You may use any Earth Model
you like: WGS84
, PZ-90
etc.
Use some kind of search tree for spatial data, e.g. a quad tree. More such data structures are referenced under "See also".
You can find an excellent explaination of Bombe's suggestion in the Jan Philip Matuschek's article "Finding Points Within a Distance of a Latitude/Longitude Using Bounding Coordinates".
Could you maybe supply a sample of your existing expensive query?
If you're doing proper great-circle calculation based on taking the sine() and cosine() of the reference point and the other data points, then a very substantial optimisation could be made by actually storing those sin/cos values in the database in addition to the lat/long values.
Alternatively, just use your database to extract a rectangle of lat/long ranges that match, and only afterwards filter out the ones that are outside the true circular radius.
But do bear in mind that one degree of longitude is a somewhat shorter distance at high latitudes than at the equator. It should be easy to figure out the right aspect ratio for that rectangle, though. You'd also have errors if you need to consider areas very close to the poles, as a rectanglar selection wouldn't cope with a circle that overlapped a pole.
This UDF (SQL Server) will get you the distance between two lat/lon points:
CREATE FUNCTION [dbo].[zipDistance] (
@Lat1 decimal(11, 6),
@Lon1 decimal(11, 6),
@Lat2 decimal(11, 6),
@Lon2 decimal(11, 6)
)
RETURNS
decimal(11, 6) AS
BEGIN
IF @Lat1 = @Lat2 AND @Lon1 = @Lon2
RETURN 0 /* same lat/long points, 0 distance = */
DECLARE @x decimal(18,13)
SET @x = 0.0
/* degrees -> radians */
SET @Lat1 = @Lat1 * PI() / 180
SET @Lon1 = @Lon1 * PI() / 180
SET @Lat2 = @Lat2 * PI() / 180
SET @Lon2 = @Lon2 * PI() / 180
/* accurate to +/- 30 feet */
SET @x = Sin(@Lat1) * Sin(@Lat2) + Cos(@Lat1) * Cos(@Lat2) * Cos(@Lon2 - @Lon1)
IF 1 = @x
RETURN 0
DECLARE @EarthRad decimal(5,1)
SET @EarthRad = 3963.1
RETURN @EarthRadius * (-1 * ATAN(@x / SQRT(1 - @x * @x)) + PI() / 2)
END
And, obviously, you can use this in a separate query, such as:
SELECT * FROM table WHERE [dbo].[zipDistance] < 25.0