Why does Flink SQL use a cardinality estimate of 1

2020-04-17 05:22发布

I wasn't sure why the logical plan wasn't correctly evaluated in this example.

I looked more deeply in the Flink base code and I checked that when calcite evaluate/estimate the number of rows for the query in object. For some reason it returns always 100 for any table source.

In Flink in fact, during the process of the program plan creation, for each transformed rule it is called the VolcanoPlanner class by the TableEnvironment.runVolcanoPlanner. The planner try to optimise and calculate some estimation by calling RelMetadataQuery.getRowCount

I reproduced the error by creating a failing test which should assert 0 as row count for relation table 'S' but it returns always 100.

Why this is happening? Does anyone has an answer to this issue?

标签： java scala apache-flink apache-calcite flink-sql

1条回答

Melony?

2楼-- · 2020-04-17 05:45

In the current version (1.7.1, Jan 2019), Flink's relational APIs (Table API and SQL) do not attempt to estimate the cardinality of base tables. Hence, Calcite uses its default value which is 100.

This works fine for basic optimizations like filter and projection push-down and is currently sufficient because Flink does not (yet) reorder joins.

The only way to inject cardinality estimates for tables is via an ExternalCatalog.

0人赞添加讨论(0) 举报

Why does Flink SQL use a cardinality estimate of 1

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间