git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Help with EnumerableMergeJoinRule which is losing a RelCollection trait


Il giorno dom 23 set 2018 alle ore 00:11 Stamatis Zampetakis
<zabetak@xxxxxxxxx> ha scritto:
>
> Hi Enrico,
>
> I think a bit more context would help.
>
> What kind of org.apache.calcite.schema.Table are you using in your Schema?

Hi Stamatis,

This is my implementation
https://github.com/diennea/herddb/blob/0c7c01584350d57d8102511b987e5f880f3f65bd/herddb-core/src/main/java/herddb/sql/CalcitePlanner.java#L1237

essentially
@Override
public Statistic getStatistic() {
   return Statistics.of(tableManager.getStats().getTablesize(),  keys);
}

where "keys" is the Primary Key.

So I assume that this means that there is no intrinsic "collation" for the table
in my undestanding these collations in Statistic define a natural
order in the table, like in a table clustered by a index
and the system may assume that an EnumerableTableScan is naturally
sorted by that collation.

Bonus question:
can you give me some more explanation about the meaning of
"RelTraitSet#replace" ?
should it 'add' collations to the Rel or it is a conversion and so, it
if fails the EnumerableMergeJoinRule should not fire ?


Thank you very much
Enrico



> I suppose you already looked but just in case are there any kind of
> statistics on these tables.
> I've seen a similar case where the statistics of the table were declaring
> that some columns were sorted without this being true.
> I just want to make sure that this is not the case here.
>
> Best,
> Stamatis
>
>
> Στις Σάβ, 22 Σεπ 2018 στις 11:43 π.μ., ο/η Enrico Olivelli <
> eolivelli@xxxxxxxxx> έγραψε:
>
> > Hi,
> > We found a strange behaviour in an execution plan, basially we have an
> > EnumerableMergeJoin which has as input two non-sorted
> > EnumerableTableScan
> >
> > all the details are in this issue on HerdDB
> >
> > https://github.com/diennea/herddb/issues/262#issuecomment-423590573
> >
> > Cut and paste from the issue in the bottom of this email
> >
> > Any help is very appreciated, maybe some ring bells ....
> >
> > [ISSUE]
> >
> > query:
> > SELECT * FROM license t0, customer c WHERE c.customer_id = t0.customer_id
> >
> > It seems that Calcite is planning a Merge Join, but the tables are not
> > sorted according to the merge keys.
> >
> > "License" table:
> > TABLE PK (non clustered): [license_id]
> > COL: license_id serialPos: 0 (serialPos is the index of the colum for
> > Calcite)
> > COL: application serialPos: 1
> > COL: creation serialPos: 2
> > COL: data serialPos: 3
> > COL: deleted serialPos: 4
> > COL: modification serialPos: 5
> > COL: signature serialPos: 6
> > COL: customer_id serialPos: 7
> >
> > "Customer" table:
> > TABLE PK (non clustered): [customer_id]
> > COL: customer_id serialPos: 0
> > COL: contact_email serialPos: 1
> > COL: contact_person serialPos: 2
> > COL: creation serialPos: 3
> > COL: deleted serialPos: 4
> > COL: modification serialPos: 5
> > COL: name serialPos: 6
> > COL: vetting serialPos: 7
> >
> > the join is on PK (non clustered) column of table customer,
> > and the "customer_id" column of table 'license' which is not sorted
> > naturally by 'customerid' (we do not have clustered indexes !!)
> >
> > This is the plan:
> >
> > EnumerableMergeJoin(condition=[=($7, $9)], joinType=[inner]): rowcount
> > = 15.75, cumulative cost = {59.75 rows, 24.0 cpu, 0.0 io}, id = 114
> > EnumerableTableScan(table=[[herd, license]]): rowcount = 15.0,
> > cumulative cost = {15.0 rows, 16.0 cpu, 0.0 io}, id = 28
> > EnumerableTableScan(table=[[herd, customer]]): rowcount = 7.0,
> > cumulative cost = {7.0 rows, 8.0 cpu, 0.0 io}, id = 29
> >
> > EnumerableTableScan does not contain any information which tells that
> > the Scan MUST be sorted according to the join keys (field 7 in
> > "licence", and field 0 in "customer")
> >
> > Here in Calcite code the additional 'Collation' is lost as the
> > "replace" does not contain any 'RelCollation', so the inputs of the
> > join are not transformed
> >
> >
> > https://github.com/apache/calcite/blob/2ab83e468d282a9428e533853aea5253816889fb/core/src/main/java/org/apache/calcite/adapter/enumerable/EnumerableMergeJoinRule.java#L78
> >
> > is it a bug in Calcite or in how we are passing data to Calcite ?
> > Tables do not have any impliticit "collation" in HerdDB so we are not
> > passing any 'RelCollation'
> >
> >
> > Thank you
> >
> > Enrico
> >