Using WHERE clause to find POI within a range of d

2020-02-14 02:50发布

问题:

I'm using following sql code to find out 'ALL' poi closest to the set coordinates, but I would want to find out specific poi instead of all of them. When I try to use the where clause I get an error and it doesn't work and this is where I'm currently stuck, since I only use one table for all the coordinates off all poi's.

SET @orig_lat=55.4058;  
SET @orig_lon=13.7907; 
SET @dist=10;
SELECT 
    *, 
    3956 * 2 * ASIN(SQRT(POWER(SIN((@orig_lat -abs(latitude)) * pi()/180 / 2), 2) 
    + COS(@orig_lat * pi()/180 ) * COS(abs(latitude) * pi()/180) 
    * POWER(SIN((@orig_lon - longitude) * pi()/180 / 2), 2) )) as distance 
FROM geo_kulplex.sweden_bobo
HAVING distance < @dist 
ORDER BY distance limit 10;

回答1:

The problem is that you can not reference an aliased column (distancein this case) in a select or where clause. For example, you can't do this:

select a, b, a + b as NewCol, NewCol + 1 as AnotherCol from table
where NewCol = 2

This will fail in both: the select statement when trying to process NewCol + 1 and also in the where statement when trying to process NewCol = 2.

There are two ways to solve this:

1) Replace the reference by the calculated value itself. Example:

select a, b, a + b as NewCol, a + b + 1 as AnotherCol from table
where  a + b = 2

2) Use an outer select statement:

select a, b, NewCol, NewCol + 1 as AnotherCol from (
    select a, b, a + b as NewCol from table
) as S
where NewCol = 2

Now, given your HUGE and not very human-friendly calculated column :) I think you should go for the last option to improve readibility:

SET @orig_lat=55.4058;  
SET @orig_lon=13.7907; 
SET @dist=10;

SELECT * FROM (
  SELECT 
    *, 
    3956 * 2 * ASIN(SQRT(POWER(SIN((@orig_lat -abs(latitude)) * pi()/180 / 2), 2) 
    + COS(@orig_lat * pi()/180 ) * COS(abs(latitude) * pi()/180) 
    * POWER(SIN((@orig_lon - longitude) * pi()/180 / 2), 2) )) as distance 
  FROM geo_kulplex.sweden_bobo
) AS S
WHERE distance < @dist
ORDER BY distance limit 10;

Edit: As @Kaii mentioned below this will result in a full table scan. Depending on the amount of data you will be processing you might want to avoid that and go for the first option, which should perform faster.



回答2:

The reason why you cant use your alias in the WHERE clause is the order in which MySQL executes things:

  1. FROM
  2. WHERE
  3. GROUP BY
  4. HAVING
  5. SELECT
  6. ORDER BY

When executing your WHERE clause, the value for your column alias is not yet calculated. This is a good thing, because it would waste a lot of performance. Imagine many (1,000,000) rows -- to use your calculation in the WHERE clause, each of those 1,000,000 would first have to be fetched and calculated so the WHERE condition can compare the calculation results to your expectation.

You can do this explicitly by either

  • using HAVING (thats the reason why HAVING has another name as WHERE - its a different thing)
  • using a subquery as illustrated by @MostyMostacho (will effectively do the same with some overhead)
  • put the complex calculation in the WHERE clause (will effectively give the same performance result as HAVING)

All those will perform almost equally bad: each row is fetched first, the distance calculated and finally filtered by distance before sending the result to the client.

You can gain much (!) better performance by mixing a simple WHERE clause for distance approximation (filtering rows to fetch first) with the more precise euclidian formula in a HAVING clause.

  1. find rows that could match the @distance = 10 condition using a WHERE clause based on simple X and Y distance (bounding box) -- this is a cheap operation.
  2. filter those results using the formula for euclidian distance in a HAVING clause -- this is an expensive operation.

Look at this query to understand what i mean:

SET @orig_lat=55.4058;
SET @orig_lon=13.7907; 
SET @dist=10;
SELECT 
    *, 
    3956 * 2 * ASIN(SQRT(POWER(SIN((@orig_lat -abs(latitude)) * pi()/180 / 2), 2)
    + COS(@orig_lat * pi()/180 ) * COS(abs(latitude) * pi()/180) 
    * POWER(SIN((@orig_lon - longitude) * pi()/180 / 2), 2) )) as distance 
FROM geo_kulplex.sweden_bobo
/* WHERE clause to pre-filter by distance approximation .. filter results 
   later with precise euclidian calculation. can use indexes. */
WHERE 
    /* i'm unsure about geo stuff ... i dont think you want a 
       distance of 10° here, please adjust this properly!! */
    latitude BETWEEN (@orig_lat - @dist) AND (@orig_lat + @dist)
    AND longitude BETWEEN (@orig_lon - @dist) AND (@orig_lon + @dist)
/* HAVING clause to filter result using the more precise euclidian distance */
HAVING distance < @dist 
ORDER BY distance limit 10;

For those who are interested in the constant:

  • 3956 is the radius of the earth in miles, so the resulting distance is measured in miles
  • 6371 is the radius of the earth in kilometers, so use this constant to measure distance in kilometers

Find more information in the wiki about the Haversine formula