I run a geolocation-based social network. Members can see other members based on how close they are.
Right now my MySQL query looks like:
$lat_min = $geo['user_lat'] - 5;
$lat_max = $geo['user_lat'] + 5;
$long_min = $geo['user_long'] - 5;
$long_max = $geo['user_long'] + 5;
$members_query = "SELECT " . $fields . " FROM members WHERE (user_lat BETWEEN " . $lat_min . " AND " . $lat_max . " AND user_long BETWEEN " . $long_min . " AND " . $long_max . ") OR (gc_lat BETWEEN " . $lat_min . " AND " . $lat_max . " AND gc_long BETWEEN " . $long_min . " AND " . $long_max . ")";
user_lat
and user_long
are coordinates based on geolocation if they have it turned on in their browser. gc_lat
and gc_long
and coordinates based on their IP address. These rows are all indexed in the database. I am pulling all members within 10 degrees.
The problem is this query takes about 2 seconds to perform for our 250,000+ members, and we want the site to scale.
ATTEMPT 2: I have tried assigning quadrants to each member e.g. "36x-99" ... I am rounding off the latitude and longitude to the nearest multiple of 3 to label the quadrant, and then I only pull quadrants within 12 degrees of the quadrant the member is in.
$members_query = "SELECT " . $fields . " FROM members WHERE quadrant IN ('36x-99', '33x-99', '30x-99', ...);
This gave me no noticeable difference in query speed.
Anyone have any ideas of what I should do? I need to find a solution that will allow the site to scale better.
The problem is the way you are storing data in the database is not suited for the type of task you are performing. Using Point
values in Geometry
data points is the way to go. Actually coded something 4+ years back for this purpose, but having issues finding it. But this post seems to cover it well.
EDIT Okay, found my old code, but it’s referring to old client data that I obviously cannot share. But the key to speed with coordinates in databases is using POINT
data stored in the database table with the type of GEOMETRY
. More details here on the official MySQL site. Since I have needed a reason to revisit this type of code—and the concepts—for a while here is a quick MySQL script I whipped up to create a sample table with sample data to convey the basic concepts. Once you understand what is happening, it opens up lots of cool options.
Also found this great/simple explanation of the concept as well.
And found another great assessment of spatial data in MySQL 5.6. Lots of great info on indexes & performance. Specifically regarding MySQL spatial index performance:
MyISAM tables support Spatial indexes, so the above queries will use those indexes.
And on the other side of that:
The InnoDB engine does not support spatial indexes, so those queries will be slow.
And here is my basic MySQL testing scripts to help illustrate the concept:
/* Create the database `spatial_test` */
CREATE DATABASE `spatial_test` CHARACTER SET utf8 COLLATE utf8_general_ci;
/* Create the table `locations` in `spatial_test` */
CREATE TABLE `spatial_test`.`locations` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`coordinates` point NOT NULL,
UNIQUE KEY `id` (`id`),
SPATIAL KEY `idx_coordinates` (`coordinates`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;
/* Insert some test data into it. */
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(27.174961 78.041822)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(27.985818 86.923596)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(44.427963 -110.588455)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(19.896766 -155.582782)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(40.748328 -73.985560)'));
INSERT INTO `spatial_test`.`locations` (`id`, `coordinates`) VALUES (NULL, GeomFromText('POINT(40.782710 -73.965310)'));
/* A sample SELECT query that extracts the 'latitude' & 'longitude' */
SELECT x(`spatial_test`.`locations`.`coordinates`) AS latitude, y(`spatial_test`.`locations`.`coordinates`) AS longitude FROM `spatial_test`.`locations`;
/* Another sample SELECT query calculates distance of all items in database based on GLength using another set of coordinates. */
SELECT GLength(LineStringFromWKB(LineString(GeomFromText(astext(PointFromWKB(`spatial_test`.`locations`.`coordinates`))), GeomFromText(astext(PointFromWKB(POINT(40.782710,-73.965310))))))) AS distance
FROM `spatial_test`.`locations`
;
/* Yet another sample SELECT query that selects items by using the Earth’s radius. The 'HAVING distance < 100' equates to a distance of less than 100 miles or kilometers based on what you set the query for. */
/* Earth’s diameter in kilometers: 6371 */
/* Earth’s diameter in miles: 3959 */
SELECT id, (3959 * acos(cos(radians(40.782710)) * cos(radians(x(`spatial_test`.`locations`.`coordinates`))) * cos(radians(y(`spatial_test`.`locations`.`coordinates`)) - radians(-73.965310)) + sin(radians(40.782710)) * sin(radians(x(`spatial_test`.`locations`.`coordinates`))))) AS distance
FROM `spatial_test`.`locations`
HAVING distance < 100
ORDER BY id
;