How to push down project, filter, aggregation to T

2020-05-24 05:29发布

问题:

I am using Apache Calcite to implement a distributed OLAP system, which datasource is RDBMS. So I want to push down the project/filter/aggregation in RelNode tree to MyTableScan extends TableScan. In MyTableScan, a RelBuilder to get the pushed RelNode. At last, RelBuilder to generate the Query to the source database. At the same time, the project/filter/aggregation in original RelNode tree should be moved or modified.

As I known, Calcite does not support this feature.

Current limitations: The JDBC adapter currently only pushes down table scan operations; all other processing (filtering, joins, aggregations and so forth) occurs within Calcite. Our goal is to push down as much processing as possible to the source system, translating syntax, data types and built-in functions as we go. If a Calcite query is based on tables from a single JDBC database, in principle the whole query should go to that database. If tables are from multiple JDBC sources, or a mixture of JDBC and non-JDBC, Calcite will use the most efficient distributed query approach that it can.

In my opinion, RelOptRule may be a good choice. Unfortunately, when I create new RelOptRule, I can not easily find the parent node to remove a node.

RelOptRule is a good choice? Anyone has a good idea to implement this feature?

Thanks.

回答1:

Creating a new RelOptRule is the way to go. Note that you shouldn't be trying directly remove any nodes inside a rule. Instead, you match a subtree that contains the nodes you want to replace (for example, a Filter on top of a TableScan). And then replace that entire subtree with an equivalent node which pushes down the filter.

This is normally handled by creating a subclass of the relevant operation which conforms to the calling convention of the particular adapter. For example, in the Cassandra adapter, there is a CassandraFilterRule which matches a LogicalFilter on top of a CassandraTableScan. The convert function then constructs a CassandraFilter instance. The CassandraFilter instance sets up the necessary information so that when the query is actually issued, the filter is available.

Browsing some of the code for the Cassandra, MongoDB, or Elasticsearch adapters may be helpful as they are on the simpler side. I would also suggest bringing this to the mailing list as you'll probably get more detailed advice there.



回答2:

I have create some RelOptRule to push down the Project/Filter/Aggregate RelNode upper TableScan. Maybe helpful to others.

RelOptRule is used to define some Rules to match subtrees in whole RelNode. When match, call the onMatch method to do something.

In the onMatch method, we can create one new RelNode and call the transformTo method to replace the matched subtree.

For example:

Project
  |
Filter
  |
TableScan

PushDownFilter rule as follows:

  public class PushDownFilter extends RelOptRule {

  public PushDownFilter(RelOptRuleOperand operand, String description) {
    super(operand, "Push_down_rule:" + description);
  }

  public static final PushDownFilter INSTANCE =
      new PushDownFilter(
          operand(
              Filter.class,
              operand(TableScan.class, none())),
          "filter_tableScan");

  @Override
  public void onMatch(RelOptRuleCall call) {
    LogicalFilter filter = (LogicalFilter) call.rels[0];
    TableScan tableScan = (TableScan) call.rels[1];
    // push down filter
    call.transformTo(tableScan);
  }
}

This rule will match the Filter->TableScan subtree, then call the onMatch method. The method only transformTo the tableScan. The result is the Filter->TableScan replaced by TableScan, the whole RelNode as follows:

Project
  |
TableScan

Note that the RelDataType of new RelNode must be equal to the matched subtree

Calcite support some rules to use, for example FilterJoinRule, FilterTableScanRule and so on.