Quest
The query selects all the points beginning with "Vancouver" and are within a 5 minute area from the center of all locations beginning with "Vancouver". For example, Vancouver South Fraser, Vancouver Fairview, and Vancouver Ballantree Place W have latitudes and longitudes within 5 minutes of their average latitude and longitude. The latitudes and longitudes are stored as (4915, 12311) integer pairs (meaning 49.15'N and 123.11'W).
SQL Code
The following SQL abomination does the trick:
SELECT
NAME
FROM
STATION
WHERE
DISTRICT_ID = '110'
AND NAME LIKE 'Vancouver%'
AND LATITUDE BETWEEN
(SELECT round((min(LATITUDE) + max(LATITUDE)) / 2)-5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
and
(SELECT round((min(LATITUDE) + max(LATITUDE)) / 2)+5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
AND LONGITUDE BETWEEN
(SELECT round((min(LONGITUDE) + max(LONGITUDE)) / 2)-5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
and
(SELECT round((min(LONGITUDE) + max(LONGITUDE)) / 2)+5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
ORDER BY
LATITUDE
Question
How can this query be simplified to remove the redundancy, without using a view?
Restrictions
The database is MySQL, but ANSI SQL is always nice.
Thank you!
select
name
from
(select
round((min(LATITUDE) + max(LATITUDE)) / 2) as LATITUDE,
round((min(LONGITUDE) + max(LONGITUDE)) / 2) as LONGITUDE
from STATION
where DISTRICT_ID = '110'
AND NAME LIKE 'Vancouver%') AS center
inner join STATION s
where
s.DISTRICT_ID = '110'
and s.NAME like 'Vancouver%'
and s.LATITUDE between center.LATITUDE - 5 and center.LATITUDE + 5
and s.LONGITUDE between center.LONGITUDE - 5 and center.LONGITUDE + 5
First of all do notice that your definition 'within 5 minutes of each other' does not define a single solution and that your (MIN()+MAX())/2 is not average but simply middle of min and max. You might be looking for AVG() in your subqueries.
Secondly you are not getting results within 5 secs from each other, but entries whose longitude and latitude are at most 10 secs (which on diagonal can be closer to 14).
In mysql you can use session variables such as:
SET @avg_lat := (SELECT round(avg(LATITUDE)) FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%');
SET @avg_long := (SELECT round(avg(LONGITUDE)) FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%');
SELECT
NAME
FROM
STATION
WHERE
DISTRICT_ID = '110'
AND pow(LATITUDE-@avg_lat,2)+pow(LONGITUDE-@avg_long,2)<25
ORDER BY
LATITUDE
Even though it is not necessary (as in the query written like above both variables occur only once).
EDIT: Oops, missread the question. It is the radius of center - so replace 25 with 100 (and while at it decide if you want to use less or equal). Also, if center is the center of bounding box then your (min()+max())/2 is right formula and not my suggestion. Still 'center of all locations' is a bit vague so I am leaving my answer (it is easy to change it).
EDIT2: Just noticed that the units in the my query are not right, if the latitude is stored in centiminutes, then the comparison should be to centiminutes too (10*100)^2=1000000
And finally, your decision to stick with (min()+max())/2 will lead to instances where you might have a single row which is far toward max and min which could make the query miss any of the results (and can happen, usually all the locations with similar names are next to each other, but it is not uncommon to have another location starting with the same name which is an isolated place far from the conglomerate of locations)
As for 5 minute area, to be completely precise better say it is 10x10 minute area, that's what the queries would return.
EDIT3: The formula used above for the distance is not very precise if you move away from the equator.
Here is a better approximations of distance formula
For serious work you might need something like this
Use a common table expression ...
with cte as
( SELECT round((min(LATITUDE) + max(LATITUDE)) / 2)-5 min_lat
, round((min(LATITUDE) + max(LATITUDE)) / 2)+5 max_lat
, round((min(LONGITUDE) + max(LONGITUDE)) / 2)-5 min_long
, round((min(LONGITUDE) + max(LONGITUDE)) / 2)+5 max_long
, DISTRICT_ID
, 'Vancouver%' AS NAME
FROM STATION
WHERE DISTRICT_ID = '110'
AND NAME LIKE 'Vancouver%'
group by DISTRICT_ID, 'Vancouver%')
SELECT
NAME
FROM
STATION , cte
WHERE
station.DISTRICT_ID = cte.DISTRICT_ID
AND station.NAME LIKE cte.NAME
AND station.LATITUDE BETWEEN cte.min_lat AND cte.max_lat
AND station.LONGITUDE BETWEEN cte.min_long AND cte.max_long
ORDER BY
station.LATITUDE
NB: I don't have access to a database right now, so I haven't been able to test this. Consequently I can't guarantee that it is works. I will test it when I can. The principle
holds.
I'm presuming that the original query provided an accurate result to your purpose. If that is the case, then you can consolidate the query by putting the calculation of the end points into a subquery.
Select ...
From Station As S
Cross Join (
Select Round( (Min(S1.Latitude) + Max(S1.Latitude)) / 2 ) As Latitude
, Round( (Min(S1.Longitude) + Max(S1.Longitude)) / 2 ) As Longitude
From Station As S1
Where S1.District_Id = '110'
And S1.Name Like 'Vancouver%'
) As S2
Where S.District_Id = '110'
And S.Name Like 'Vancouver%'
And S.Latitude Between (S2.Latitude - 5) And (S2.Latitude + 5)
And S.Longitude Between (S2.Longitude - 5) And (S2.Longitude + 5)
Order By S.Latitude
Whatever happened to good old Pythagoras (OK I know it doesn't really apply to curved surfaces - but should be a good enough approximation). If you are looking for the centre (actually, the centroid based on the interpretation applied by physicist rather than geometrists) of the set of coordinate pairs, then you should not be using MIN and MAX, although you might consider bounding the search based on the MIN and MAX). The only remaining fly in the ointment is that you store an integer representation of a string representation of a coordinate angle.
Consider:
SELECT b.name
FROM
(SELECT AVG(CALC(a.lattitude)) AS c_lat, AVG(CALC(a.longitude)) AS c_long
FROM station a
WHERE a.district_id='110'
AND a.name like 'VANCOUVER%'
) AS ilv,
station b
WHERE b.district_id='110'
AND b.name LIKE 'VANCOUVER%'
AND POW(ilv.c_lat-CALC(b.lattitude),2)
+ POW(olv.c_long-CALC(b.longitude),2)<=25;
Where the CALC function converts the stored value into a longitude/lattitude in minutes, i.e.
CALC(x)=(FLOOR(x/100)*60+MOD(x,100))
C.