I have to optimize queries by tuning basic PostgreSQL server configuration parameters. In documentation I've came across the work_mem
parameter. Then I checked how changing this parameter would influence performance of my query (using sort). I measured query execution time with various work_mem
settings and was very disappointed.
The table on which I perform my query contains 10,000,000 rows and there are 430 MB of data to sort. (Sort Method: external merge Disk: 430112kB
).
With work_mem = 1MB
, EXPLAIN
output is:
Total runtime: 29950.571 ms (sort takes about 19300 ms).
Sort (cost=4032588.78..4082588.66 rows=19999954 width=8)
(actual time=22577.149..26424.951 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 5MB
:
Total runtime: 36282.729 ms (sort: 25400 ms).
Sort (cost=3485713.78..3535713.66 rows=19999954 width=8)
(actual time=25062.383..33246.561 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
With work_mem = 64MB
:
Total runtime: 42566.538 ms (sort: 31000 ms).
Sort (cost=3212276.28..3262276.16 rows=19999954 width=8)
(actual time=28599.611..39454.279 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
Can anyone explain why performance gets worse? Or suggest any other methods to makes queries execution faster by changing server parameters?
My query (I know it's not optimal, but I have to benchmark this kind of query):
SELECT n
FROM (
SELECT n + 1 AS n FROM table_name
EXCEPT
SELECT n FROM table_name) AS q1
ORDER BY n DESC;
Full execution plan:
Sort (cost=5805421.81..5830421.75 rows=9999977 width=8) (actual time=30405.682..30405.682 rows=1 loops=1)
Sort Key: q1.n
Sort Method: quicksort Memory: 25kB
-> Subquery Scan q1 (cost=4032588.78..4232588.32 rows=9999977 width=8) (actual time=30405.636..30405.637 rows=1 loops=1)
-> SetOp Except (cost=4032588.78..4132588.55 rows=9999977 width=8) (actual time=30405.634..30405.634 rows=1 loops=1)
-> Sort (cost=4032588.78..4082588.66 rows=19999954 width=8) (actual time=23046.478..27733.020 rows=20000000 loops=1)
Sort Key: "*SELECT* 1".n
Sort Method: external merge Disk: 430104kB
-> Append (cost=0.00..513495.02 rows=19999954 width=8) (actual time=0.040..8191.185 rows=20000000 loops=1)
-> Subquery Scan "*SELECT* 1" (cost=0.00..269247.48 rows=9999977 width=8) (actual time=0.039..3651.506 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..169247.71 rows=9999977 width=8) (actual time=0.038..2258.323 rows=10000000 loops=1)
-> Subquery Scan "*SELECT* 2" (cost=0.00..244247.54 rows=9999977 width=8) (actual time=0.008..2697.546 rows=10000000 loops=1)
-> Seq Scan on table_name (cost=0.00..144247.77 rows=9999977 width=8) (actual time=0.006..1079.561 rows=10000000 loops=1)
Total runtime: 30496.100 ms
I posted your query plan on explain.depesz.com, have a look.
The query planner's estimates are terribly wrong in some places. Have you run
ANALYZE
recently?Read the chapters in the manual on Statistics Used by the Planner and Planner Cost Constants. Pay special attention to the chapters on
random_page_cost
anddefault_statistics_target
.I would try:
Given your 10M rows you might even go higher. It depends on the distribution of your data and the actual query. Experiment. Default is 100, max. is 10000.
For a database of that size, only 1 or 5 MB of
work_mem
are generally not enough. Read the Postgres Wiki page on Tuning Postgres that @aleroot linked to.As your query needs 430104kB of memory on disk according to
EXPLAIN
output, you have to setwork_mem
to something like 500MB or more to allow in-memory sorting. In-memory representation of data needs some more space than on-disk representation. You may be interested in what Tom Lane posted on that matter recently.Increasing
work_mem
by just a little, like you tried, won't help much or can even slow down, apparently. Not sure how to explain the slowdown. Maybe the additionalwork_mem
doesn't get used at all or only initially, being not enough, but takes away from other memory allocation.If your memory is limited don't set
work_mem
too high or you might starve competing resources. Only set it high enough temporarily for this query:Reset it to your default afterwards:
Or, better yet, use
SET LOCAL
to set it just for the current transaction. Best run all of it in a single transaction either way.The except query is equivalent to the following NOT EXISTS form, which generates a different query plan (but the same results) here ( 9.0.1beta something)
(a version with a recursive CTE might also be possible :-)
EDIT: the query plans. all for 100K records with 0.2 % deleted
Original query:
NOT EXISTS-version with CTE:
NOT EXISTS-version without CTE
My conclusion is that the "NOT EXISTS" versions cause postgres to produce better plans.