I have an app where basically I have a huge table (100 million records) of information, with each row containing a lat/long value.
I'm constantly querying this table to get all the records that fit within a radius around a certain point. For example, "all records within 5 miles of 39.89288,-104.919434"
For this, I have an index over the Lat/Long columns, and I'm getting the "bounding square" of points, and then discarding all the points that fall outside of the circle in my ASP.Net app, since that was faster than doing the circle calculation in SQL Server.
NOTE: This is all data about the US, so I'm considering the earth to be flat for my calculations, which is precise enough for my needs.
Now, the main problem with the Lat/Long index is that being "a square" of points, and since i'm trying to find "Lat between x and y" and "Long between x and y", it can't really use the index super-efficiently, as it could if I were searching over "a line" of points.
I've been reading up on SQL 2008's spatial features, but I haven't found enough concrete information to know whether this is useful for me.
So the question is: Does SQL 2008 have some kind of different type of index that will make this specific type of query much faster than I can with SQL 2005?
Yes, this can be done very well with SQL 2008 spatial data. There is some expertise/trial and error(?) in setting the spatial indexing up at the right mesh level, but after that it is supposed it be great (relayed to me by friends, I have not used it in production myself).
For your purposes (lat/lon) you will wan the geographic type and not the geometric. I believe that the spatial indexes set up a "nested bounding triangle" type of indexing that is an improvement over the "bounding box" type of preconditioning that we are forced to do in SQL without it.
OK, I would suggest starting at this post at Grant Fritchey's "Home of the Scary DBA" Blog (tell him I sent you if you want to ask questions :-) ). This is a good explanation of some performance analysis & tuning of spatial indexes he was just starting to learn and also includes links to lots of other material.
you know, I was doing Lat / Longitude queries at Starbucks about 5 years ago...
and basically, we wanted to correlate stores to distribution centers.. I was working in their operations department, and they honestly couldn't tell which store was supplied by a particular warehouse.
So I ended up coming up with this 'magic circle' algorithm.
basically, they had a bunch of queries that looked like this: select * from table1, table2 Where UdfDistance(table1.Lat, table1.Long, table2.Lat, table2.Long) >= 250
I ended up coming up with something REALLY similiar, but it ran a LOT faster
select * from table1, table2 Where UdfDistance(table1.Lat, table1.Long, table2.Lat, table2.Long) >= 250 and table1.Lat between (table2.Lat - 1) and (table2.Lat + 1) and table1.Long between (table2.Long - 1) and (table2.Long + 1)
Basically- don't try comparing two geographical points if there were more than 1 degree (lat and long) difference between the two.
In other words- it used NORMAL RELATIONAL INDEXES to filter out a lot of the values, and then the UDF distance calculation had a LOT less information to process.
Hope that helps, I'll try to help clarify if I need to
Found this:
For SQL 2008:
http://blogs.lessthandot.com/index.php/DataMgmt/DataDesign/sql-server-2008-proximity-search-with-th
Apparently it IS possible
NOTE: The SQL 2005 version of that article doesn't perform too well. I've tried that kind of things and it works better to just get the square from SQL Server, and then cull the circle in your own code.
More links of interest:
http://msdn.microsoft.com/en-us/library/bb964712.aspx
(Finally an explanation!!)
And a sample query... Apparently this is how to do the search I want (points within a 5-mile circle):
(LocGeog is the Geography column)
Strangely, though, this runs WAY slower than my regular old query (7 times slower), so I'm obviously still doing something very wrong.
Yes! Check out this article about spatial indexes. You'll see that these types of indexes work better than the "indexed rectangle" approach. Besides, not only will you be able to effectively query for "is point near another point", but do all other sorts of geographical operations. Here's a complete list of all available methods on the type.