I have the following MySQL query:
SELECT
a.*,
( 3959 * acos( cos( radians('47.3909') ) * cos( radians( a.lat ) ) * cos( radians( a.lng ) - radians('-122.2637') ) + sin( radians('47.3909') ) * sin( radians( a.lat ) ) ) ) AS distance
FROM zip_codes AS a
ORDER BY distance ASC
LIMIT 1;
This will get me the zip code in my zip_codes
table that is nearest to the coordinates I have specified.
However, this is running pretty slow! Around 1 second. All similar queries run around 1 second as well. I am wondering if I can optimize either my table structure or the query to improve the query time.
This is the schema of my zip_codes
table:
CREATE TABLE `zip_codes` (
`zip` varchar(10) COLLATE utf8_unicode_ci NOT NULL,
`city` varchar(64) CHARACTER SET utf8 DEFAULT NULL,
`state` char(2) CHARACTER SET utf8 DEFAULT NULL,
`type` char(1) CHARACTER SET utf8 DEFAULT NULL,
`timezone` int(11) DEFAULT NULL,
`lat` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`lng` varchar(255) CHARACTER SET utf8 DEFAULT NULL,
`country` varchar(2) COLLATE utf8_unicode_ci DEFAULT '',
PRIMARY KEY (`zip`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=COMPACT;
UPDATE I changed the field type for lat
and lng
to DECIMAL and the queries are actually taking longer now, surprisingly!
ok, so I have to warn you that this is definitely not a perfect solution and has the following drawbacks:
it does not work for all points in the US. For example, if you select a point somewhere in Alaska, further than 50 km from every of the ZIP code centers in the table, it will return nothing
it requires MyISAM storage engine
in includes hardcoded values (see ~50 km in the 1st point). It's not exactly 50km and varies from the longitude.
Prerequisites:
Given the dump you sent, you should launch the following queries:
ALTER TABLE `zip_codes` ENGINE=MYISAM; -- changing your storage engine to MyISAM. It supports spatial indexes in MySQL
ALTER TABLE `zip_codes` ADD `pt` POINT NOT NULL; -- adding POINT() spatial datatype for zip cetner. Eventually, you may remove the old lat/lng decimal columns
ALTER TABLE `zip_codes` ADD `region` POLYGON NOT NULL; -- adding a rectangle over the center of the zip code. See below, this is something to utilize spatial index later in ST_Intersects function
// update the new columns with respective values
UPDATE `zip_codes` SET `pt` = POINT(lat,lng);
UPDATE `zip_codes` SET `region` = GEOMFROMTEXT(CONCAT('POLYGON((',lat-0.5,' ',lng-0.5,', ',lat+0.5,' ',lng-0.5,', ',lat+0.5,' ',lng+0.5,', ',lat-0.5,' ',lng+0.5,', ',lat-0.5,' ',lng-0.5,'))')); -- 0.5 is 0.5 degrees hardcode. There is a better approach and it's better to write a MySQL function that will increase the MBR with certain step until there is intersection (see my point #1 above, this is the best solution)
// create indexes on the newly created columns
ALTER TABLE `zip_codes` ADD SPATIAL INDEX(`region`);
ALTER TABLE `zip_codes` ADD SPATIAL INDEX(`pt`);
The new query
SELECT SQL_NO_CACHE zip,ST_Distance(`pt`,POINT('47.3909','-122.2637')) AS dst
FROM `zip_codes`
WHERE ST_Intersects(POINT('47.3909','-122.2637'),`region`)
ORDER BY `dst`
LIMIT 1;
on my machine it takes about 0.011 seconds, which is much better.
But again, see my comment above near the update statement, you should consider two things:
- coding a function that will increase that Minimum-bounding Rectangle with a step of 0.5 deg (for example) until there is intersection
- moving to PostgreSQL + PostGIS extension. Much more powerful if you deal with that number of records requiring spatial extensions