I have 19 years of Oracle and MySQL experience (DBA and dev) and I am new to Postgres, so I may be missing something obvious. But I can not get this query to do what I want.
NOTE: This query is running on an EngineYard Postgres instance. I am not immediately aware of the parameters it has set up. Also, columns applicable_type and status in the items table are of extension type citext.
The following query can take in excess of 60 seconds to return rows:
SELECT items.item_id,
CASE when items.sku is null then items.title else concat(item.title, ' (SKU: ', items.sku, ')') END title,
items.listing_status, items.updated_at, items.id,
items.sku, count(details.id) detail_count
FROM "items" LEFT OUTER JOIN details ON details.applicable_id = items.id
and details.applicable_type = 'Item'
and details.status = 'Valid'
LEFT OUTER JOIN products ON products.id = items.product_id
WHERE "items"."user_id" = 3
GROUP BY items.id
ORDER BY title asc
LIMIT 25 OFFSET 0
The details
table contains 6.5M rows. The LEFT OUTER JOIN
to it does a sequential scan on applicable_id
. Cardinality-wise, that column has 120K distinct possibilities across 6.5M rows.
I have a btree index on details
with the following columns:
applicable_id
applicable_type
status
but really, applicable_id
and applicable_type
have low cardinality.
My explain analyze
looks like this:
Limit (cost=247701.59..247701.65 rows=25 width=118) (actual time=28781.090..28781.098 rows=25 loops=1)
-> Sort (cost=247701.59..247703.05 rows=585 width=118) (actual time=28781.087..28781.090 rows=25 loops=1)
Sort Key: (CASE WHEN (items.sku IS NULL) THEN (items.title)::text ELSE pg_catalog.concat(items.title, ' (SKU: ', items.sku, ')') END)
Sort Method: top-N heapsort Memory: 30kB
-> HashAggregate (cost=247677.77..247685.08 rows=585 width=118) (actual time=28779.658..28779.974 rows=664 loops=1)
-> Hash Right Join (cost=2069.47..247645.64 rows=6425 width=118) (actual time=17798.898..28742.395 rows=60047 loops=1)
Hash Cond: (details.applicable_id = items.id)
-> Seq Scan on details (cost=0.00..220591.65 rows=6645404 width=8) (actual time=6.272..27702.717 rows=6646205 loops=1)
Filter: ((applicable_type = 'Listing'::citext) AND (status = 'Valid'::citext))
Rows Removed by Filter: 942
-> Hash (cost=2062.16..2062.16 rows=585 width=118) (actual time=1.286..1.286 rows=664 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 90kB
-> Bitmap Heap Scan on items (cost=16.87..2062.16 rows=585 width=118) (actual time=0.157..0.748 rows=664 loops=1)
Recheck Cond: (user_id = 3)
-> Bitmap Index Scan on index_items_on_user_id (cost=0.00..16.73 rows=585 width=0) (actual time=0.141..0.141 rows=664 loops=1)
Index Cond: (user_id = 3)
Total runtime: 28781.238 ms
I think you need an index on applicable_id column only (without applicable_type, status columns). You may also need to increase default_statistics_target param (system wide or better for applicable_id column only) so postgresql had better guess about number of rows in joining.
Do you have an index on the expression that yields the title? Better yet, one on (user_id, title_expression).
If not, that might be an excellent thing to add, so as to nestloop through the first 25 rows of an index scan, seeing that Postgres can't reasonably guess which random 25 rows (hence the seq scan you're currently getting on the joined table) will be needed.