git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'Spool' Node support


I assume you’re talking about HepPlanner? VolcanoPlanner doesn’t “split” anything, it only adds new things.

As you’ve noticed Spool isn’t finished, but the idea would be to use VolcanoPlanner, because it can truly handle plans that are DAGs, then use some kind of costing trick to ensure that nodes that are shared are only counted in the overall cost once.

> On Oct 22, 2018, at 8:26 PM, Ted Xu <frankxus@xxxxxxxxx> wrote:
> 
> Hi folks,
> 
> I'm not sure if there is a recommended way to represent diverged (multiple
> parents) plan in Calcite. It’s true that RelNode data structure is
> compatible with multiple parents, but it is not working in optimizer.
> 
> For example, if we have query as follows,
> 
> FROM (SELECT c1, random() as c2, c3 FROM src)
> INSERT OVERWRITE TABLE src1 SELECT c1, c2
> INSERT OVERWRITE TABLE src2 SELECT c3, c2
> 
> TableSink1(on columns c1, c2)
>    Project(c1, random() as c2, c3)
>        TableScan
> TableSink2(on columns c3, c2)
>    Project(c1, random() as c2, c3)
>        TableScan
> 
> Planners will recognize Projects and TableScans share the common digests
> thus merged together, but Project Transpose Rules splits them, which breaks
> the random assumption.
> 
> My solution is to add a Spool node to prevent any rule to further split a
> sub-plan, but it generates sub-optimal result. I've noticed there is a
> really old JIRA ticket https://jira.apache.org/jira/browse/CALCITE-481 but
> it was somehow suspended.
> 
> I'd like to move on on this feature, but there are still something to do
> first:
> 
> 1. Let RelOptRuleCall to aware parents, currently only HepRelOptRuleCall
> passes parents in certain cases.
> 2. Let RelOptRuleOperand to define multiple parent patterns
> 
> Please correct me if I'm something wrong, any suggestion will be much
> appreciated.