How to get a list of suburbs surrounding a locatio

2019-01-15 22:03发布

I get a list of suburbs within a specified distance from a single location using Queries A.

I’m trying to adapt Queries A to get a list of suburbs surrounding location1, then get list of suburbs surrounding location2 and so on (I'll call this Queries B). Essentially Queries B is doing the same as Queries A, but repeating it for each separate location. My question- how can I do this using MySQL only. Suggestions on how to do this are much appreciated.


Here is a sample of the data I am working with. SqlFiddle here

CREATE TABLE `geoname` (
    `geonameid` INT(11) NOT NULL,
    `asciiname` VARCHAR(200) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `country` VARCHAR(2) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `latitude` DECIMAL(10,7) NULL DEFAULT NULL,
    `longitude` DECIMAL(10,7) NULL DEFAULT NULL,
    `fcode` VARCHAR(10) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `population` INT(11) NULL DEFAULT NULL,
    `area` INT(11) NULL DEFAULT NULL,
    PRIMARY KEY (`geonameid`),
    INDEX `asciiname` (`asciiname`),
    INDEX `country` (`country`),
    INDEX `latitude` (`latitude`),
    INDEX `longitude` (`longitude`),
    INDEX `fcode` (`fcode`),
    INDEX `population` (`population`),
    INDEX `area` (`area`)
)
COLLATE='utf8_unicode_ci'
ENGINE=InnoDB
;

INSERT INTO geoname(geonameid, asciiname, country, latitude, longitude, fcode, population, area) VALUES
(2147497, 'Tamworth', 'AU', -31.0904800, 150.9290500, 'PPL', 47597, 72),
(8597559, 'Tamworth', 'AU', -21.0457400, 143.6685200, 'PPL', 0, 0),
(8805708, 'Tamworth', 'AU', -21.0471300, 143.6692000, 'HMSD', 0, 0),
(2655603, 'Birmingham', 'GB', 52.4814200, -1.8998300, 'PPL', 984333, 599),
(4782167, 'Roanoke', 'US', 37.2709700, -79.9414300, 'PPL', 97032, 321),
(10114336, 'East Tamworth', 'AU', -31.0854800, 150.9372100, 'PPLX', 2621, 0),
(10114337, 'North Tamworth', 'AU', -31.0786200, 150.9221900, 'PPPL', 0, 0),
(2143940, 'West Tamworth', 'AU', -31.1023600, 150.9144700, 'PPLX', 0, 0),
(2656867, 'Aston', 'GB', 52.5000000, -1.8833300, 'PPLX', 0, 0),
(2646814, 'Hockley', 'GB', 52.5000000, -1.9166700, 'PPLX', 13919, 0),
(2650236, 'Edgbaston', 'GB', 52.4623000, -1.9211500, 'PPLX', 0, 0),
(4754994, 'Cumberland Forest', 'US', 37.1401300, -80.3217100, 'PPLX', 0, 0),
(4774999, 'Mountain Top Estates', 'US', 37.1376300, -80.3247700, 'PPPL', 0, 0),
(4764119, 'Highland Park', 'US', 37.2237400, -80.3917200, 'PPLX', 0, 0);

What I tried

Queries A- get suburbs surrounding a single point of interest

SELECT @lat := latitude, @lng :=longitude FROM geoname WHERE asciiname = 'Tamworth' and country='AU' and population>0 and fcode='PPL';

SELECT
    name as suburb, 'Tamworth' as point_of_interest, country,
    (
    (
    ACOS(SIN(@lat * PI() / 180) * SIN(latitude * PI() / 180) + COS(@lat * PI() / 180) * COS(latitude * PI() / 180) * COS((
    @lng - longitude
    ) * PI() / 180)) * 180 / PI()
    ) * 60 * 1.851999999962112
    ) AS distance
  FROM geoname
    WHERE fcode='PPLX' OR fcode='PPPL'
  HAVING distance <= '60'
  ORDER BY distance ASC;

RESULTS

The query above returns one location for the point of interest.

+---------------------------------+
|     @lat       |      @lng      |
+---------------------------------+
| 52.6339900     |   -1.6958700   |
+---------------------------------+

and a list of suburbs surrounding Tamworth.

    | point_of_interest |      suburb          | country |           distance |
    |-------------------|----------------------|---------|--------------------|
    |          Tamworth |  East Tamworth       |      AU | 0.9548077598752538 |
    |          Tamworth |  North Tamworth      |      AU | 1.4707125875055387 |
    |          Tamworth |  West Tamworth       |      AU |  1.915025922482298 |

I tried to create Queries B using MySQL user variables, GROUP_CONCAT() and FIND_IN_SET(). The idea was that I could cycle through the values a bit like using an array. I can post my last attempt if you wish, but I am not even close to a solution (not for lack of trying).

UPDATE: Here is one of my last attempts.

SELECT @lat := GROUP_CONCAT(latitude), @lng :=GROUP_CONCAT(longitude), @city :=GROUP_CONCAT(asciiname), @area :=GROUP_CONCAT(area) FROM geoname WHERE (asciiname = 'Tamworth' or asciiname = 'Birmingham' or asciiname = 'Roanoke') and population>0 and fcode='PPL';

SELECT
    FIND_IN_SET(asciiname, @city) as point_of_interest, asciiname as suburb, country,
    (
    (
    ACOS(SIN(FIND_IN_SET(latitude, @lat) * PI() / 180) * SIN(latitude * PI() / 180) + COS(FIND_IN_SET(latitude, @lat) * PI() / 180) * COS(latitude * PI() / 180) * COS((
    FIND_IN_SET(longitude, @lng) - longitude
    ) * PI() / 180)) * 180 / PI()
    ) * 60 * 1.851999999962112
    ) AS distance
  FROM geoname   
  HAVING distance <= FIND_IN_SET(distance, @area)
  ORDER BY distance ASC;

Desired Results for Queries B. For 3 points of interest-Tamworth, Birmingham and Roanoke- this is what I would expect to see.

| point_of_interest |      suburb          | country |           distance |
|-------------------|----------------------|---------|--------------------|
|          Tamworth |  East Tamworth       |      AU | 0.9548077598752538 |
|          Tamworth | North Tamworth       |      AU | 1.4707125875055387 |
|          Tamworth |  West Tamworth       |      AU |  1.915025922482298 |
|        Birmingham |        Aston         |      GB |  2.347111909955497 |
|        Birmingham |       Hockley        |      GB | 2.3581405942861164 |
|        Birmingham |      Edgbaston       |      GB |  2.568384753388139 |
|           Roanoke |    Cumberland Forest |      US |  36.66226789588173 |
|           Roanoke | Mountain Top Estates |      US |  37.02185777044897 |
|           Roanoke |        Highland Park |      US | 40.174566427830094 |

Suggestions on how to do this using MySQL are greatly appreciated.

2条回答
Viruses.
2楼-- · 2019-01-15 22:47

Using Spatial Data Types.

Well first of all if you have a lot of geospatial data, you should be using mysql's geospatial extensions rather than calculations like this. You can then create spatial indexes that would speed up many queries and you don't have to write long drawn out queries like the one above.

Using a comparision with ST_Distance or creating a geometry with the radius of interest along with ST_within might give you good results and could be a lot faster than the current. However the best and fastest way to achieve this, ST_Dwithin isn't implemented yet in mysql.

These data types are available mysql 5.7 onwards but it's totally worth the effort to upgrade your DB if you are in an older version.

The new table structure.

CREATE TABLE `geoname2` (
    `geonameid` INT(11) NOT NULL,
    `asciiname` VARCHAR(200) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `country` VARCHAR(2) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `pt` POINT,
    `fcode` VARCHAR(10) NULL DEFAULT NULL COLLATE 'utf8_unicode_ci',
    `population` INT(11) NULL DEFAULT NULL,
    `area` INT(11) NULL DEFAULT NULL,
    PRIMARY KEY (`geonameid`),
    INDEX `asciiname` (`asciiname`),
    INDEX `country` (`country`),
    INDEX `fcode` (`fcode`),
    INDEX `population` (`population`),
    INDEX `area` (`area`),
    SPATIAL INDEX `pt` (`pt`)
)COLLATE='utf8_unicode_ci'
ENGINE=InnoDB;

Notice that the latitude and longitude fields have been replaced by pt and their indexes have been replaced by a single index.

The new query A

SELECT asciiname as suburb, 'Tamworth' as point_of_interest, country,  
  ST_DISTANCE(`pt`, POINT(@lat,@lng)) as distance 
FROM geoname2     
WHERE (fcode='PPLX' OR fcode='PPPL') AND ST_DISTANCE(`pt`, POINT(@lat,@lng))  <= 1
ORDER BY distance ASC;

Clearly it's a lot simpler. It's probably faster too but with only 14 records to test on it's hard to reach any sort of conclusion, no index will be used for such small tables.

Note that ST_DISTANCE results are returned in degrees it's conventionally assumed that 1 degree is about 60 miles or 111 km (you have done so in your calculation)

BTW, In the existing setup, you do have an index on latitude and longitude but please note that mysql can use only one index per table so if you don't adopt geospatial queries you might want to convert that into a single composite index on latitude,longitude.

The full query.

Now the above query can be modified as follows to give the 'query B' in it's new form.

SELECT DISTINCT  g1.asciiname, g2.asciiname ,ST_DISTANCE(g1.pt, g2.pt) *111 as distance FROM geoname2 g1 
INNER JOIN (SELECT `pt`, asciiname  
    FROM geoname2 
     WHERE (fcode='PPLX' OR fcode='PPPL') AND 
       ST_DISTANCE(`pt`, POINT(@lat,@lng))  <= 1) as g2
WHERE ST_DISTANCE(g1.pt,g2.pt) < 1 
AND g1.asciiname != g2.asciiname ORDER BY distance ASC;

Note again that I am assuming 1 degree (approximately 111 km to be close to one another)

查看更多
霸刀☆藐视天下
3楼-- · 2019-01-15 23:06

You simply need to perform a self-join. Joining tables is a very fundamental part of SQL—you really should read up on it before trying to understand this answer further.

SELECT   poi.asciiname,
         suburb.asciiname,
         suburb.country,
         DEGREES(
           ACOS(
             SIN(RADIANS(   poi.latitude))
           * SIN(RADIANS(suburb.latitude))
           + COS(RADIANS(   poi.latitude))
           * COS(RADIANS(suburb.latitude))
           * COS(RADIANS(poi.longitude - suburb.longitude))
           )
         ) * 60 * 1.852 AS distance
FROM     geoname AS poi
    JOIN geoname AS suburb
WHERE    poi.asciiname IN ('Tamworth', 'Birmingham', 'Roanoke')
     AND poi.population > 0
     AND poi.fcode = 'PPL'
     AND suburb.fcode IN ('PPLX', 'PPPL')
HAVING   distance <= 60
ORDER BY poi.asciiname, distance

See it on sqlfiddle.

You'll have noticed that I've used MySQL's IN() operator as a shorthand for value = A OR value = B OR ....

You'll also have noticed that I've used MySQL's DEGREES() and RADIANS() functions rather than trying to perform such conversions explicitly.

You were then multiplying minutes of latitude by a factor of 1.851999999962112, which was rather strange: it's extremely close to 1.852, which is the precise number of kilometres in a nautical mile (historically defined as a minute of latitude), but yet bizarrely slightly different—I've assumed you meant to use that instead.

Finally, you had the literal value by which you were filtering the distances in the resultset as a string, i.e. '60', whereas obviously this is a numeric value and should be unquoted.

查看更多
登录 后发表回答