I have two equivalent queries which extracts the average distance between buildings (table a) and the nearest highway (highways in table v) in a specific district (ace) and city (pro_com).
This is the CTE version
WITH subq AS (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2
FROM atlas_sezioni2 as v
where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code
ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')
)
select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM subq
) disttable
order by n, dist asc
) final;
where in the CTE I extract the 15 nearest highways and calculate the distance, in order to use GIST indexes (http://workshops.boundlessgeo.com/postgis-intro/knn.html). And the CTE's explain:
Aggregate (cost=37342.10..37342.11 rows=1 width=8)
CTE subq
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a (cost=0.29..29987.90 rows=20900 width=236211)
Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
SubPlan 1
-> Limit (cost=141.04..141.08 rows=15 width=236190)
-> Sort (cost=141.04..141.21 rows=69 width=236190)
Sort Key: ((a.geom <-> v.geom))
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v (cost=0.28..139.35 rows=69 width=236190)
Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
Filter: ((a.code)::text <> (code)::text)
-> Unique (cost=7247.20..7351.70 rows=200 width=72)
-> Sort (cost=7247.20..7299.45 rows=20900 width=72)
Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
-> CTE Scan on subq (cost=0.00..5747.50 rows=20900 width=72)
(15 rows)
This is the equivalent with subqueries:
select avg(dist) from (
select distinct on(n) n, dist
from (
SELECT n, ST_Distance_Sphere(g1, g2) as dist FROM (
SELECT a.n, a.geom as g1, unnest(ARRAY(SELECT v.geom as g2
FROM atlas_sezioni2 as v
where v.code = '12230' and a.pro_com = v.pro_com and a.code <> v.code
ORDER BY a.geom <-> v.geom LIMIT 15)) as g2
FROM atlas_sezioni2 a
where a.pro_com = 15146 and a.ace = 1 and a.code IN('11100', '11210', '11220', '11230', '11240', '11300', '12100', '14200')
) subq
) disttable
order by n, dist asc
) final
and its explain
Aggregate (cost=6366298.35..6366298.36 rows=1 width=8)
-> Unique (cost=6365932.60..6366037.10 rows=20900 width=236230)
-> Sort (cost=6365932.60..6365984.85 rows=20900 width=236230)
Sort Key: subq.n, (_st_distance(geography(subq.g1), geography(subq.g2), 0::double precision, false))
-> Subquery Scan on subq (cost=0.29..35526.40 rows=20900 width=236230)
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 a (cost=0.29..29987.90 rows=20900 width=236211)
Index Cond: (((code)::text = ANY ('{11100,11210,11220,11230,11240,11300,12100,14200}'::text[])) AND (ace = 1) AND (pro_com = 15146::numeric))
SubPlan 1
-> Limit (cost=141.04..141.08 rows=15 width=236190)
-> Sort (cost=141.04..141.21 rows=69 width=236190)
Sort Key: ((a.geom <-> v.geom))
-> Index Scan using atlas_sezioni2_code_ace_pro_com_n_idx on atlas_sezioni2 v (cost=0.28..139.35 rows=69 width=236190)
Index Cond: (((code)::text = '12230'::text) AND (a.pro_com = pro_com))
Filter: ((a.code)::text <> (code)::text)
(14 rows)
I know CTEs are boundary fences for optimization (Postgres doesn't optmize between CTEs and queries that are outside them), but this is strange. Why is the performance blowing out in this way?