git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

'Spool' Node support


Hi folks,

I'm not sure if there is a recommended way to represent diverged (multiple
parents) plan in Calcite. It’s true that RelNode data structure is
compatible with multiple parents, but it is not working in optimizer.

For example, if we have query as follows,

FROM (SELECT c1, random() as c2, c3 FROM src)
INSERT OVERWRITE TABLE src1 SELECT c1, c2
INSERT OVERWRITE TABLE src2 SELECT c3, c2

TableSink1(on columns c1, c2)
    Project(c1, random() as c2, c3)
        TableScan
TableSink2(on columns c3, c2)
    Project(c1, random() as c2, c3)
        TableScan

Planners will recognize Projects and TableScans share the common digests
thus merged together, but Project Transpose Rules splits them, which breaks
the random assumption.

My solution is to add a Spool node to prevent any rule to further split a
sub-plan, but it generates sub-optimal result. I've noticed there is a
really old JIRA ticket https://jira.apache.org/jira/browse/CALCITE-481 but
it was somehow suspended.

I'd like to move on on this feature, but there are still something to do
first:

1. Let RelOptRuleCall to aware parents, currently only HepRelOptRuleCall
passes parents in certain cases.
2. Let RelOptRuleOperand to define multiple parent patterns

Please correct me if I'm something wrong, any suggestion will be much
appreciated.