git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[DISCUSS] [Calcite-2683] ProjectMergeRule should not be performed when Nondeterministic udf has been referenced more than once


Hi,

Currently, there are some merge rules for Project, such as CalcMergeRule,
ProjectMergeRule, and ProjectCalcMergeRule. I found that these merge rules
should not be performed when Nondeterministic expression of the
bottom(inner) project has been referenced more than once by the top(outer)
project. Take the following test as an example:

  @Test public void testProjectMergeCalcMergeWithNonDeterministic() throws
Exception {
    HepProgram program = new HepProgramBuilder()
            .addRuleInstance(FilterProjectTransposeRule.INSTANCE)
            .addRuleInstance(ProjectMergeRule.INSTANCE)
            .build();

    checkPlanning(program,
            "select name, a as a1, a as a2 from (\n"
                    + "  select *, rand() as a\n"
                    + "  from dept)\n"
                    + "where deptno = 10\n");
  }

The first select generates `a` from `rand()` and the second select generate
`a1` and `a2` from `a`. From the SQL, `a1` should equal to `a2`.
Let's take a look at the result plan:

LogicalProject(NAME=[$1], A1=[RAND()], A2=[RAND()])
  LogicalFilter(condition=[=($0, 10)])
    LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

In the plan, a1 may not equal to a2 due to the projects merge which is
against the SQL(a1 equals to a2).
In order to let a1 equal to a2, one option to solve the problem is to
disable these merge rules in such cases, so that the result plan will be:

LogicalProject(NAME=[$1], A1=[$2], A2=[$2])
  LogicalProject(DEPTNO=[$0], NAME=[$1], A=[RAND()])
    LogicalFilter(condition=[=($0, 10)])
      LogicalTableScan(table=[[CATALOG, SALES, DEPT]])

Do you guys have any good ideas or encountered similar problems? Any
suggestions are greatly appreciated.

Best,
Hequn

[1] jira link: https://issues.apache.org/jira/browse/CALCITE-2683