I'm trying to use ST_SnapToGrid and then GROUP BY
the grid cells (x, y). Here is what I did first:
SELECT
COUNT(*) AS n,
ST_X(ST_SnapToGrid(geom, 50)) AS x,
ST_Y(ST_SnapToGrid(geom, 50)) AS y
FROM points
GROUP BY x, y
I don't want to recompute ST_SnapToGrid
for both x
and y
. So I changed it to use a sub-query:
SELECT
COUNT(*) AS n,
ST_X(geom) AS x,
ST_Y(geom) AS y
FROM (
SELECT
ST_SnapToGrid(geom, 50) AS geom
FROM points
) AS tmp
GROUP BY x, y
But when I run EXPLAIN
, both of these queries have the exact same execution plan:
GroupAggregate (...)
-> Sort (...)
Sort Key: (st_x(st_snaptogrid(points.geom, 0::double precision))), (st_y(st_snaptogrid(points.geom, 0::double precision)))
-> Seq Scan on points (...)
Question: Will PostgreSQL reuse the result value of ST_SnapToGrid()
?
If not, is there a way to make it do this?
Test timing
You don't see the evaluation of individual functions per row in the EXPLAIN
output.
Test with EXPLAIN ANALYZE
to get actual query times to compare overall effectiveness. Run a couple of times to rule out caching artifacts. For simple queries like this, you get more reliable numbers for the total runtime with:
EXPLAIN (ANALYZE, TIMING OFF) SELECT ...
Requires Postgres 9.2+. Per documentation:
TIMING
Include actual startup time and time spent in each node in the output. The overhead of repeatedly reading the system clock can slow
down the query significantly on some systems, so it may be useful to
set this parameter to FALSE
when only actual row counts, and not exact
times, are needed. Run time of the entire statement is always
measured, even when node-level timing is turned off with this option.
This parameter may only be used when ANALYZE
is also enabled. It
defaults to TRUE
.
Prevent repeated evaluation
Normally, expressions in a subquery are evaluated once. But Postgres can collapse trivial subqueries if it thinks that will be faster.
To introduce an optimization barrier, you could use a CTE instead of the subquery. This guarantees that Postgres computes ST_SnapToGrid(geom, 50)
once only:
WITH cte AS (
SELECT ST_SnapToGrid(geom, 50) AS geom1
FROM points
)
SELECT COUNT(*) AS n
, ST_X(geom1) AS x
, ST_Y(geom1) AS y
FROM cte
GROUP BY geom1; -- see below
However, this it's probably slower than a subquery due to more overhead for a CTE. The function call is probably very cheap. Generally, Postgres knows better how to optimize a query plan. Only introduce such an optimization barrier if you know better.
Simplify
I changed the name of the computed point in the subquery / CTE to geom1
to clarify it's different from the original geom
. That helps to clarify the more important thing here:
GROUP BY geom1
instead of:
GROUP BY x, y
That's obviously cheaper - and may have an influence on whether the function call is repeated. So, this is probably fastest:
SELECT COUNT(*) AS n
, ST_X(ST_SnapToGrid(geom, 50)) AS x
, ST_y(ST_SnapToGrid(geom, 50)) AS y
FROM points
GROUP BY ST_SnapToGrid(geom, 50); -- same here!
Or maybe this:
SELECT COUNT(*) AS n
, ST_X(geom1) AS x
, ST_y(geom1) AS y
FROM (
SELECT ST_SnapToGrid(geom, 50) AS geom1
FROM points
) AS tmp
GROUP BY geom1;
Test all three with EXPLAIN ANALYZE
or EXPLAIN (ANALYZE, TIMING OFF)
and see for yourself. Testing >> guessing.