git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: 'Spool' Node support


I get the impression you’re missing something important about how VolcanoPlanner works - equivalence sets. You speak of detecting duplicates by looking at RelNodes’ digests, whereas you should look at whether they are in the same equivalence set. 

And you also suggest that the previous plan is thrown away, and in VolcanoPlanner that is not the case. 

To take your project-project example. After merging projects there will be a choice of inputs - the combined project on top of a scan, and a project on top of the spool. They are equivalent. The project on top of the spool is likely the best plan - because it shares work with the other node that uses the spool - but it’s possible that the combined project prunes columns so aggressively that it’s better. 

So yes, Spool node would prevent plans diverging, but it’s really equivalence sets that are doing the work. Spool mainly helps with accounting.

Julian

> On Oct 22, 2018, at 8:46 PM, Ted Xu <frankxus@xxxxxxxxx> wrote:
> 
> Thanks Julian for the comment.
> 
> To some extent it is "split" in VolcanoPlanner. Let's revisit the example I
> mentioned above:
> 
> TableSink1(on columns c1, c2)
>    Project(c1, random() as c2, c3)
>        TableScan
> TableSink2(on columns c3, c2)
>    Project(c1, random() as c2, c3)
>        TableScan
> 
> The 2 projects share the common digests, thus it is recognized in
> VolcanoPlanner, the plan in memo looks like:
> 
> TableSink1     TableSink2
>         \                 /
>             Project (c1, random() as c2, c3)
>                 |
>            TableScan
> 
> However, if we "add" a project here, the project will further match other
> rule pattern (e.g., PrjectMergeRule), the random project will no longer
> share the same digest, new plan looks like:
> 
> TableSink1              TableSink2
>      |                               |
> Project (c1, random())   Project(c3, random())
>      |                               |
> TableScan(c1)         TableScan(c3)
> 
> Notice that the random function call is executed twice, which breaks the
> assumption of original sql query.
> 
> I think Spool node should prevent such kind of transformation.
> 
>> On Tue, Oct 23, 2018 at 11:30 AM Julian Hyde <jhyde@xxxxxxxxxx> wrote:
>> 
>> I assume you’re talking about HepPlanner? VolcanoPlanner doesn’t “split”
>> anything, it only adds new things.
>> 
>> As you’ve noticed Spool isn’t finished, but the idea would be to use
>> VolcanoPlanner, because it can truly handle plans that are DAGs, then use
>> some kind of costing trick to ensure that nodes that are shared are only
>> counted in the overall cost once.
>> 
>>> On Oct 22, 2018, at 8:26 PM, Ted Xu <frankxus@xxxxxxxxx> wrote:
>>> 
>>> Hi folks,
>>> 
>>> I'm not sure if there is a recommended way to represent diverged
>> (multiple
>>> parents) plan in Calcite. It’s true that RelNode data structure is
>>> compatible with multiple parents, but it is not working in optimizer.
>>> 
>>> For example, if we have query as follows,
>>> 
>>> FROM (SELECT c1, random() as c2, c3 FROM src)
>>> INSERT OVERWRITE TABLE src1 SELECT c1, c2
>>> INSERT OVERWRITE TABLE src2 SELECT c3, c2
>>> 
>>> TableSink1(on columns c1, c2)
>>>   Project(c1, random() as c2, c3)
>>>       TableScan
>>> TableSink2(on columns c3, c2)
>>>   Project(c1, random() as c2, c3)
>>>       TableScan
>>> 
>>> Planners will recognize Projects and TableScans share the common digests
>>> thus merged together, but Project Transpose Rules splits them, which
>> breaks
>>> the random assumption.
>>> 
>>> My solution is to add a Spool node to prevent any rule to further split a
>>> sub-plan, but it generates sub-optimal result. I've noticed there is a
>>> really old JIRA ticket https://jira.apache.org/jira/browse/CALCITE-481
>> but
>>> it was somehow suspended.
>>> 
>>> I'd like to move on on this feature, but there are still something to do
>>> first:
>>> 
>>> 1. Let RelOptRuleCall to aware parents, currently only HepRelOptRuleCall
>>> passes parents in certain cases.
>>> 2. Let RelOptRuleOperand to define multiple parent patterns
>>> 
>>> Please correct me if I'm something wrong, any suggestion will be much
>>> appreciated.
>> 
>>