Is my sort key being used?

I have a table with a column updated_at, which is a sort key. After running both VACUUM and ANALYZE on the table, this is the query plan I get when filtering on updated_at:

EXPLAIN
SELECT *
FROM my_table
WHERE updated_at > '2018-01-01';

QUERY PLAN
XN Seq Scan on my_table  (cost=0.00..0.00 rows=1 width=723)
  Filter: (updated_at > '2018-01-01 00:00:00'::timestamp without time zone)

My understanding is that the query execution engine is doing a sequential scan of the table despite the sort key and thus the sort key is not doing anything?

标签： amazon-web-services amazon-redshift

2条回答

祖国的老花朵

2楼-- · 2019-08-13 00:49

Have a look at the "Analyzing the Query Summary" section in the Redshift docs. It shows how you can use the SVL_QUERY_SUMMARY view to observe extremely detailed metrics for each query execution.

For the most effective sort key usage you should see the rr_scan (range restricted scan) column set to t and the num_rows_pre_filter should be fairly close to the rows count. num_rows_pre_filter is the number of rows scanned off of disk before applying your predicate filter. NB: "Fairly close" in this context will vary depending on your specific data.

SELECT stm,seg,step,TRIM(LEFT(label,30))"label"
      ,rows_pre_filter,rows,avgtime,bytes,is_rrscan 
FROM svl_query_summary 
WHERE query  = 123456
ORDER BY stm ,seg ,step; 

| stm | seg | step |             label             | rows_pre_filter | rows | avgtime | bytes | is_rrscan |
|-----|-----|------|-------------------------------|-----------------|------|---------|-------|-----------|
|   0 |   0 |    0 | scan   tbl=428142 name=my_tbl |          103665 |    6 |   52814 |  1273 | t         |
|   0 |   0 |    1 | project                       |               0 |    6 |   52814 |     0 | f         |
|   0 |   0 |    2 | sort   tbl=303                |               0 |    6 |   52814 |  1288 | f         |
|   1 |   1 |    0 | scan   tbl=303 name=Internal  |               0 |    6 |      74 |  1288 | f         |
|   1 |   1 |    1 | return                        |               0 |    6 |      74 |     0 | f         |
|   1 |   2 |    0 | merge                         |               0 |    0 |     275 |     0 | f         |
|   1 |   2 |    1 | project                       |               0 |    6 |     275 |     0 | f         |
|   1 |   2 |    2 | return                        |               0 |    6 |     275 |  1387 | f         |

0人赞添加讨论(0) 举报

冷血范

3楼-- · 2019-08-13 00:55

Sequential scans are perfectly normal on Amazon Redshift, since it does not use indexes.

The system is smart enough to skip-over blocks that do not contain desired values, since each block (which contains data for only one column) stores the minimum and maximum of each value in the block. Thus, a block with all dates prior to 2018-01-01 would be automatically skipped.

This will not appear in the EXPLAIN plan, since it depends on the actual data stored in each block.

The best thing is to run some tests and see whether it is running quickly, as you would expect. You would want to run SET enable_result_cache_for_session TO OFF to stop caching from impacting results.

Also, try to avoid situations where the SORT KEY is cast to a different type. In the above example, if the column is a DATE but the query is using it as a TIMESTAMP, then it might not be able to skip over blocks since it has to transform the value after it is read from disk. Therefore, it might work better if there WHERE is using the exact same datatype.

0人赞添加讨论(0) 举报

Is my sort key being used?

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间