git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once


While I think it's probably true, I'm hesitant to believe that just because
the rewritten version has the same number of calls, that it will always
produce the same results. My preference would probably be to go with the
"strict" semantic as per Julian's specification since that's least likely
to cause problems. If we allow other semantics, I would suggest that strict
should be the default unless we can clearly prove that other semantics will
always produce correct results.

--
Michael Mior
mmior@xxxxxxxxxx


Le lun. 19 nov. 2018 à 13:45, Julian Hyde <jhyde@xxxxxxxxxx> a écrit :

> Repeating the comments I made in the JIRA case [1].
>
> I do find your argument compelling, that if the rewritten version contains
> the same number of calls to the UDF, it should be OK.
>
> But there are other possible semantics. For instance, a “strict” semantic
> could allow rewrite only if the calls to the UDF are guaranteed to be the
> same number, and the same order. A “relaxed” semantic would allow
> non-deterministic functions (and dynamic functions, see [2]) to be
> rewritten any time.
>
> Perhaps there could be variants of this rule, one for each semantic, and
> the semantics could be chosen via a connection- or statement-level
> property. To enforce a particular semantic, several rules will need to
> modify their behavior (e.g. FilterProjectTransposeRule), so those rules
> would be parameterized on semantic also.
>
> Julian
>
> [2] https://issues.apache.org/jira/browse/CALCITE-2638 <
> https://issues.apache.org/jira/browse/CALCITE-2638>
>
> > On Nov 19, 2018, at 7:51 AM, Hequn Cheng <chenghequn@xxxxxxxxx> wrote:
> >
> > Hi,
> >
> > Currently, there are some merge rules for Project, such as CalcMergeRule,
> > ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge
> rules
> > should not be performed when Nondeterministic expression of the
> > bottom(inner) project has been referenced more than once by the
> top(outer)
> > project. Take the following test as an example:
> >
> >  @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws
> > Exception {
> >    HepProgram program = new HepProgramBuilder()
> >            .addRuleInstance(FilterProjectTransposeRule.INSTANCE)
> >            .addRuleInstance(ProjectMergeRule.INSTANCE)
> >            .build();
> >
> >    checkPlanning(program,
> >            "select name, a as a1, a as a2 from (\n"
> >                    + "  select *, rand() as a\n"
> >                    + "  from dept)\n"
> >                    + "where deptno = 10\n");
> >  }
> >
> > The first select generates `a` from `rand()` and the second select
> generate
> > `a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
> > Let's take a look at the result plan:
> >
> > LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
> >  LogicalFilter(condition=[=($0, 10)])
> >    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> >
> > In the plan, a1 may not equal to a2 due to the projects merge which is
> > against the SQL(a1 equals to a2).
> > In order to let a1 equal to a2, one option to solve the problem is to
> > disable these merge rules in such cases, so that the result plan will be:
> >
> > LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
> >  LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
> >    LogicalFilter(condition=[=($0, 10)])
> >      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])
> >
> > Do you guys have any good ideas or encountered similar problems? Any
> > suggestions are greatly appreciated.
> >
> > Best,
> > Hequn
> >
> > [1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683
>
>