Eliminate subquery for average numeric value

2020-04-19 09:23发布

问题:

Quest

The query selects all the points beginning with "Vancouver" and are within a 5 minute area from the center of all locations beginning with "Vancouver". For example, Vancouver South Fraser, Vancouver Fairview, and Vancouver Ballantree Place W have latitudes and longitudes within 5 minutes of their average latitude and longitude. The latitudes and longitudes are stored as (4915, 12311) integer pairs (meaning 49.15'N and 123.11'W).

SQL Code

The following SQL abomination does the trick:

SELECT
  NAME
FROM
 STATION
WHERE
      DISTRICT_ID = '110'
  AND NAME LIKE 'Vancouver%'
  AND LATITUDE BETWEEN
    (SELECT round((min(LATITUDE) + max(LATITUDE)) / 2)-5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
    and
    (SELECT round((min(LATITUDE) + max(LATITUDE)) / 2)+5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
  AND LONGITUDE BETWEEN
    (SELECT round((min(LONGITUDE) + max(LONGITUDE)) / 2)-5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
    and
    (SELECT round((min(LONGITUDE) + max(LONGITUDE)) / 2)+5 FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%')
ORDER BY
  LATITUDE

Question

How can this query be simplified to remove the redundancy, without using a view?

Restrictions

The database is MySQL, but ANSI SQL is always nice.

Thank you!

回答1:

select 
  name
from 
  (select 
    round((min(LATITUDE) + max(LATITUDE)) / 2) as LATITUDE,
    round((min(LONGITUDE) + max(LONGITUDE)) / 2) as LONGITUDE
   from STATION 
   where DISTRICT_ID = '110' 
     AND NAME LIKE 'Vancouver%') AS center
  inner join STATION s
where
  s.DISTRICT_ID = '110' 
  and s.NAME like 'Vancouver%'
  and s.LATITUDE between center.LATITUDE - 5 and center.LATITUDE + 5
  and s.LONGITUDE between center.LONGITUDE - 5 and center.LONGITUDE + 5


回答2:

First of all do notice that your definition 'within 5 minutes of each other' does not define a single solution and that your (MIN()+MAX())/2 is not average but simply middle of min and max. You might be looking for AVG() in your subqueries.

Secondly you are not getting results within 5 secs from each other, but entries whose longitude and latitude are at most 10 secs (which on diagonal can be closer to 14).

In mysql you can use session variables such as:

SET @avg_lat := (SELECT round(avg(LATITUDE)) FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%');
SET @avg_long := (SELECT round(avg(LONGITUDE)) FROM STATION WHERE DISTRICT_ID = '110' AND NAME LIKE 'Vancouver%');

SELECT
  NAME
FROM
 STATION
WHERE
  DISTRICT_ID = '110'
  AND pow(LATITUDE-@avg_lat,2)+pow(LONGITUDE-@avg_long,2)<25
ORDER BY
  LATITUDE

Even though it is not necessary (as in the query written like above both variables occur only once).

EDIT: Oops, missread the question. It is the radius of center - so replace 25 with 100 (and while at it decide if you want to use less or equal). Also, if center is the center of bounding box then your (min()+max())/2 is right formula and not my suggestion. Still 'center of all locations' is a bit vague so I am leaving my answer (it is easy to change it).

EDIT2: Just noticed that the units in the my query are not right, if the latitude is stored in centiminutes, then the comparison should be to centiminutes too (10*100)^2=1000000

And finally, your decision to stick with (min()+max())/2 will lead to instances where you might have a single row which is far toward max and min which could make the query miss any of the results (and can happen, usually all the locations with similar names are next to each other, but it is not uncommon to have another location starting with the same name which is an isolated place far from the conglomerate of locations)

As for 5 minute area, to be completely precise better say it is 10x10 minute area, that's what the queries would return.

EDIT3: The formula used above for the distance is not very precise if you move away from the equator. Here is a better approximations of distance formula For serious work you might need something like this



回答3:

Use a common table expression ...

with cte as
 (  SELECT round((min(LATITUDE) + max(LATITUDE)) / 2)-5 min_lat
           , round((min(LATITUDE) + max(LATITUDE)) / 2)+5 max_lat
           , round((min(LONGITUDE) + max(LONGITUDE)) / 2)-5 min_long
           , round((min(LONGITUDE) + max(LONGITUDE)) / 2)+5 max_long
           , DISTRICT_ID
           ,  'Vancouver%' AS NAME 
    FROM STATION 
    WHERE DISTRICT_ID = '110' 
AND NAME LIKE 'Vancouver%'
group by DISTRICT_ID,  'Vancouver%')
SELECT
  NAME
FROM
 STATION , cte
WHERE
      station.DISTRICT_ID = cte.DISTRICT_ID
  AND station.NAME LIKE cte.NAME
  AND station.LATITUDE BETWEEN cte.min_lat AND cte.max_lat
   AND station.LONGITUDE BETWEEN cte.min_long AND cte.max_long
ORDER BY
  station.LATITUDE

NB: I don't have access to a database right now, so I haven't been able to test this. Consequently I can't guarantee that it is works. I will test it when I can. The principle holds.



回答4:

I'm presuming that the original query provided an accurate result to your purpose. If that is the case, then you can consolidate the query by putting the calculation of the end points into a subquery.

Select ...
From Station As S
    Cross Join  (
                Select Round( (Min(S1.Latitude)  + Max(S1.Latitude)) / 2 ) As Latitude
                    , Round( (Min(S1.Longitude)  + Max(S1.Longitude)) / 2 ) As Longitude
                From Station As S1
                Where S1.District_Id = '110'
                    And S1.Name Like 'Vancouver%'
                ) As S2
Where S.District_Id = '110'
    And S.Name Like 'Vancouver%'
    And  S.Latitude Between (S2.Latitude - 5) And (S2.Latitude + 5)
    And  S.Longitude Between (S2.Longitude - 5) And (S2.Longitude + 5)
Order By S.Latitude


回答5:

Whatever happened to good old Pythagoras (OK I know it doesn't really apply to curved surfaces - but should be a good enough approximation). If you are looking for the centre (actually, the centroid based on the interpretation applied by physicist rather than geometrists) of the set of coordinate pairs, then you should not be using MIN and MAX, although you might consider bounding the search based on the MIN and MAX). The only remaining fly in the ointment is that you store an integer representation of a string representation of a coordinate angle.

Consider:

SELECT b.name
FROM
(SELECT AVG(CALC(a.lattitude)) AS c_lat, AVG(CALC(a.longitude)) AS c_long
  FROM station a 
  WHERE a.district_id='110'
  AND a.name like 'VANCOUVER%'
) AS ilv,
station b
WHERE b.district_id='110'
AND b.name LIKE 'VANCOUVER%'
AND POW(ilv.c_lat-CALC(b.lattitude),2)
     + POW(olv.c_long-CALC(b.longitude),2)<=25;

Where the CALC function converts the stored value into a longitude/lattitude in minutes, i.e.

CALC(x)=(FLOOR(x/100)*60+MOD(x,100))

C.