Why is Postgres scanning a huge table instead of u

I noticed one of my SQL queries is much slower than I expected it to be, and it turns out that the query planner is coming up with a plan that seems really bad to me. My query looks like this:

select A.style, count(B.x is null) as missing, count(*) as total
  from A left join B using (id, type)
  where A.country_code in ('US', 'DE', 'ES')
  group by A.country_code, A.style
  order by A.country_code, total

B has a (type, id) index, and A has a (country_code, style) index. A is much smaller than B: 250K rows in A vs 100M in B.

So, I expected the query plan to look something like:

Use the index on A to select just those rows with appropriate country_code
Left join with B, to find the matching row (if any) based on its (type, id) index
Group things according to country_code and style
Add up the counts

But the query planner decides the best way to do this is a sequential scan on B, and then a right join against A. I can't fathom why that is; does anyone have an idea? Here's the actual query plan it generated:

 Sort  (cost=14283513.27..14283513.70 rows=171 width=595)
   Sort Key: a.country_code, (count(*))
   ->  HashAggregate  (cost=14283505.22..14283506.93 rows=171 width=595)
         ->  Hash Right Join  (cost=8973.71..14282810.03 rows=55615 width=595)
               Hash Cond: ((b.type = a.type) AND (b.id = a.id))
               ->  Seq Scan on b (cost=0.00..9076222.44 rows=129937844 width=579)
               ->  Hash  (cost=8139.49..8139.49 rows=55615 width=28)
                     ->  Bitmap Heap Scan on a  (cost=1798.67..8139.49 rows=55615 width=28)
                           Recheck Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))
                           ->  Bitmap Index Scan on a_country_code_type_idx  (cost=0.00..1784.76 rows=55615 width=0)
                                 Index Cond: ((country_code = ANY ('{US,DE,ES}'::bpchar[])))

Edit: following a clue from the comments on another question, I tried it with SET ENABLE_SEQSCAN TO OFF;, and the query runs ten times as fast. Obviously I don't want to permanently disable sequential scans, but this helps confirm my otherwise-baseless guess that the sequential scan is not the best plan available.

标签： sql performance postgresql indexing sql-execution-plan

2条回答

放荡不羁爱自由

2楼-- · 2019-06-26 11:30

If the query is actually faster with an index scan as your added test proves, then it's typically one or both of these:

Your statistics are off or not precise enough to cover irregular data distribution.
Your cost settings are off, which Postgres uses to base its cost estimation on.

Details for both in this closely related answer:

Keep PostgreSQL from sometimes choosing a bad query plan

0人赞添加讨论(0) 举报

\"骚年 ilove

3楼-- · 2019-06-26 11:31

Probably you db has right. It looks there are 55k matching rows for the first filter. Running this amount of index scan iterations can be extremely time consuming. Usually hash joins are faster for not so selective things.

Anyway you can try a few things:

remove the left keyword and use inner join.
analyze your tables.

0人赞添加讨论(0) 举报

Why is Postgres scanning a huge table instead of u

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间