I get a list of suburbs within a specified distance from a single location using Queries A.
I’m trying to adapt Queries A to get a list of suburbs surrounding location1, then get list of suburbs surrounding location2 and so on (I'll call this Queries B). Essentially Queries B is doing the same as Queries A, but repeating it for each separate location. My question- how can I do this using MySQL only. Suggestions on how to do this are much appreciated.
Here is a sample of the data I am working with. SqlFiddle here
CREATE TABLE `geoname` (
`geonameid` INT(11) NOT NULL,
`asciiname` VARCHAR(200) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
`country` VARCHAR(2) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
`latitude` DECIMAL(10,7) NULL DEFAULT NULL,
`longitude` DECIMAL(10,7) NULL DEFAULT NULL,
`fcode` VARCHAR(10) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
`population` INT(11) NULL DEFAULT NULL,
`area` INT(11) NULL DEFAULT NULL,
PRIMARY KEY (`geonameid`),
INDEX `asciiname` (`asciiname`),
INDEX `country` (`country`),
INDEX `latitude` (`latitude`),
INDEX `longitude` (`longitude`),
INDEX `fcode` (`fcode`),
INDEX `population` (`population`),
INDEX `area` (`area`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
;
INSERT INTO geoname(geonameid, asciiname, country, latitude, longitude, fcode, population, area) VALUES
(2147497, 'Tamworth', 'AU', -31.0904800, 150.9290500, 'PPL', 47597, 72),
(8597559, 'Tamworth', 'AU', -21.0457400, 143.6685200, 'PPL', 0, 0),
(8805708, 'Tamworth', 'AU', -21.0471300, 143.6692000, 'HMSD', 0, 0),
(2655603, 'Birmingham', 'GB', 52.4814200, -1.8998300, 'PPL', 984333, 599),
(4782167, 'Roanoke', 'US', 37.2709700, -79.9414300, 'PPL', 97032, 321),
(10114336, 'East Tamworth', 'AU', -31.0854800, 150.9372100, 'PPLX', 2621, 0),
(10114337, 'North Tamworth', 'AU', -31.0786200, 150.9221900, 'PPPL', 0, 0),
(2143940, 'West Tamworth', 'AU', -31.1023600, 150.9144700, 'PPLX', 0, 0),
(2656867, 'Aston', 'GB', 52.5000000, -1.8833300, 'PPLX', 0, 0),
(2646814, 'Hockley', 'GB', 52.5000000, -1.9166700, 'PPLX', 13919, 0),
(2650236, 'Edgbaston', 'GB', 52.4623000, -1.9211500, 'PPLX', 0, 0),
(4754994, 'Cumberland Forest', 'US', 37.1401300, -80.3217100, 'PPLX', 0, 0),
(4774999, 'Mountain Top Estates', 'US', 37.1376300, -80.3247700, 'PPPL', 0, 0),
(4764119, 'Highland Park', 'US', 37.2237400, -80.3917200, 'PPLX', 0, 0);
What I tried
Queries A- get suburbs surrounding a single point of interest
SELECT @lat := latitude, @lng :=longitude FROM geoname WHERE asciiname = 'Tamworth' and country='AU' and population>0 and fcode='PPL';
SELECT
name as suburb, 'Tamworth' as point_of_interest, country,
(
(
ACOS(SIN(@lat * PI() / 180) * SIN(latitude * PI() / 180) + COS(@lat * PI() / 180) * COS(latitude * PI() / 180) * COS((
@lng - longitude
) * PI() / 180)) * 180 / PI()
) * 60 * 1.851999999962112
) AS distance
FROM geoname
WHERE fcode='PPLX' OR fcode='PPPL'
HAVING distance <= '60'
ORDER BY distance ASC;
RESULTS
The query above returns one location for the point of interest.
+---------------------------------+
| @lat | @lng |
+---------------------------------+
| 52.6339900 | -1.6958700 |
+---------------------------------+
and a list of suburbs surrounding Tamworth.
| point_of_interest | suburb | country | distance |
|-------------------|----------------------|---------|--------------------|
| Tamworth | East Tamworth | AU | 0.9548077598752538 |
| Tamworth | North Tamworth | AU | 1.4707125875055387 |
| Tamworth | West Tamworth | AU | 1.915025922482298 |
I tried to create Queries B using MySQL user variables, GROUP_CONCAT()
and FIND_IN_SET()
. The idea was that I could cycle through the values a bit like using an array. I can post my last attempt if you wish, but I am not even close to a solution (not for lack of trying).
UPDATE: Here is one of my last attempts.
SELECT @lat := GROUP_CONCAT(latitude), @lng :=GROUP_CONCAT(longitude), @city :=GROUP_CONCAT(asciiname), @area :=GROUP_CONCAT(area) FROM geoname WHERE (asciiname = 'Tamworth' or asciiname = 'Birmingham' or asciiname = 'Roanoke') and population>0 and fcode='PPL';
SELECT
FIND_IN_SET(asciiname, @city) as point_of_interest, asciiname as suburb, country,
(
(
ACOS(SIN(FIND_IN_SET(latitude, @lat) * PI() / 180) * SIN(latitude * PI() / 180) + COS(FIND_IN_SET(latitude, @lat) * PI() / 180) * COS(latitude * PI() / 180) * COS((
FIND_IN_SET(longitude, @lng) - longitude
) * PI() / 180)) * 180 / PI()
) * 60 * 1.851999999962112
) AS distance
FROM geoname
HAVING distance <= FIND_IN_SET(distance, @area)
ORDER BY distance ASC;
Desired Results for Queries B. For 3 points of interest-Tamworth, Birmingham and Roanoke- this is what I would expect to see.
| point_of_interest | suburb | country | distance |
|-------------------|----------------------|---------|--------------------|
| Tamworth | East Tamworth | AU | 0.9548077598752538 |
| Tamworth | North Tamworth | AU | 1.4707125875055387 |
| Tamworth | West Tamworth | AU | 1.915025922482298 |
| Birmingham | Aston | GB | 2.347111909955497 |
| Birmingham | Hockley | GB | 2.3581405942861164 |
| Birmingham | Edgbaston | GB | 2.568384753388139 |
| Roanoke | Cumberland Forest | US | 36.66226789588173 |
| Roanoke | Mountain Top Estates | US | 37.02185777044897 |
| Roanoke | Highland Park | US | 40.174566427830094 |
Suggestions on how to do this using MySQL are greatly appreciated.
Using Spatial Data Types.
Well first of all if you have a lot of geospatial data, you should be using mysql's geospatial extensions rather than calculations like this. You can then create spatial indexes that would speed up many queries and you don't have to write long drawn out queries like the one above.
Using a comparision with ST_Distance or creating a geometry with the radius of interest along with ST_within might give you good results and could be a lot faster than the current. However the best and fastest way to achieve this, ST_Dwithin isn't implemented yet in mysql.
These data types are available mysql 5.7 onwards but it's totally worth the effort to upgrade your DB if you are in an older version.
The new table structure.
Notice that the
latitude
andlongitude
fields have been replaced bypt
and their indexes have been replaced by a single index.The new query A
Clearly it's a lot simpler. It's probably faster too but with only 14 records to test on it's hard to reach any sort of conclusion, no index will be used for such small tables.
Note that ST_DISTANCE results are returned in degrees it's conventionally assumed that 1 degree is about 60 miles or 111 km (you have done so in your calculation)
BTW, In the existing setup, you do have an index on latitude and longitude but please note that mysql can use only one index per table so if you don't adopt geospatial queries you might want to convert that into a single composite index on
latitude,longitude
.The full query.
Now the above query can be modified as follows to give the 'query B' in it's new form.
Note again that I am assuming 1 degree (approximately 111 km to be close to one another)
You simply need to perform a self-join. Joining tables is a very fundamental part of SQL—you really should read up on it before trying to understand this answer further.
See it on sqlfiddle.
You'll have noticed that I've used MySQL's
IN()
operator as a shorthand forvalue = A OR value = B OR ...
.You'll also have noticed that I've used MySQL's
DEGREES()
andRADIANS()
functions rather than trying to perform such conversions explicitly.You were then multiplying minutes of latitude by a factor of
1.851999999962112
, which was rather strange: it's extremely close to1.852
, which is the precise number of kilometres in a nautical mile (historically defined as a minute of latitude), but yet bizarrely slightly different—I've assumed you meant to use that instead.Finally, you had the literal value by which you were filtering the distances in the resultset as a string, i.e.
'60'
, whereas obviously this is a numeric value and should be unquoted.