I am using Apache Calcite to implement a distributed OLAP system, which datasource is RDBMS. So I want to push down the project/filter/aggregation in RelNode
tree to MyTableScan extends TableScan
. In MyTableScan
, a RelBuilder
to get the pushed RelNode
. At last, RelBuilder
to generate the Query to the source database. At the same time, the project/filter/aggregation in original RelNode
tree should be moved or modified.
As I known, Calcite does not support this feature.
Current limitations: The JDBC adapter currently only pushes down table scan operations; all other processing (filtering, joins, aggregations and so forth) occurs within Calcite. Our goal is to push down as much processing as possible to the source system, translating syntax, data types and built-in functions as we go. If a Calcite query is based on tables from a single JDBC database, in principle the whole query should go to that database. If tables are from multiple JDBC sources, or a mixture of JDBC and non-JDBC, Calcite will use the most efficient distributed query approach that it can.
In my opinion, RelOptRule
may be a good choice. Unfortunately, when I create new RelOptRule
, I can not easily find the parent node to remove a node.
RelOptRule
is a good choice? Anyone has a good idea to implement this feature?
Thanks.
Creating a new RelOptRule
is the way to go. Note that you shouldn't be trying directly remove any nodes inside a rule. Instead, you match a subtree that contains the nodes you want to replace (for example, a Filter
on top of a TableScan
). And then replace that entire subtree with an equivalent node which pushes down the filter.
This is normally handled by creating a subclass of the relevant operation which conforms to the calling convention of the particular adapter. For example, in the Cassandra adapter, there is a CassandraFilterRule
which matches a LogicalFilter
on top of a CassandraTableScan
. The convert
function then constructs a CassandraFilter
instance. The CassandraFilter
instance sets up the necessary information so that when the query is actually issued, the filter is available.
Browsing some of the code for the Cassandra, MongoDB, or Elasticsearch adapters may be helpful as they are on the simpler side. I would also suggest bringing this to the mailing list as you'll probably get more detailed advice there.
I have create some RelOptRule
to push down the Project/Filter/Aggregate RelNode upper TableScan. Maybe helpful to others.
RelOptRule
is used to define some Rules to match subtrees in whole RelNode. When match, call the onMatch
method to do something.
In the onMatch
method, we can create one new RelNode and call the transformTo
method to replace the matched subtree.
For example:
Project
|
Filter
|
TableScan
PushDownFilter rule as follows:
public class PushDownFilter extends RelOptRule {
public PushDownFilter(RelOptRuleOperand operand, String description) {
super(operand, "Push_down_rule:" + description);
}
public static final PushDownFilter INSTANCE =
new PushDownFilter(
operand(
Filter.class,
operand(TableScan.class, none())),
"filter_tableScan");
@Override
public void onMatch(RelOptRuleCall call) {
LogicalFilter filter = (LogicalFilter) call.rels[0];
TableScan tableScan = (TableScan) call.rels[1];
// push down filter
call.transformTo(tableScan);
}
}
This rule will match the Filter->TableScan
subtree, then call the onMatch
method. The method only transformTo
the tableScan
. The result is the Filter->TableScan
replaced by TableScan
, the whole RelNode as follows:
Project
|
TableScan
Note that the RelDataType
of new RelNode must be equal to the matched subtree。
Calcite support some rules to use, for example FilterJoinRule
, FilterTableScanRule
and so on.