git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best practice for exhaustive planning


Hi all,

Bumping this again because I'd like to be quite sure the answer is "Calcite
doesn't support this". For example, I'd like to reject full cartesian
joins. Currently, all joins can be converted to Beam convention and then
there's some logic later to complain about cross joins. I would prefer to
do this in the rule set, making a cross join just not convertible to Beam
convention, to incentivize finding other plans, but still give a user a
good error message.

What do people actually do in this situation? Possibilities: (a) scrape the
syntax before planning, missing opportunities where a transformation might
end up with a viable plan (b) make an "ErrorRel" with impossibly high cost
so it will only be chosen as the last resort, somewhat like yacc error
productions, could be hard to get a decent error message. I don't like
these options, particularly.

Kenn

On Wed, May 30, 2018 at 6:10 AM Michael Mior <mmior@xxxxxxxxxx> wrote:

> Unfortunately, I'm not sure of the best way how to proceed from here, but
> it seems like you're making progress :)
> --
> Michael Mior
> mmior@xxxxxxxxxx
>
>
>
> Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <klk@xxxxxxxxxx.invalid> a
> écrit :
>
> > Thanks Michael,
> >
> > I don't think that applies in our case - we aren't doing a table scan and
> > having Calcite implement the rest, but are translating the whole plan to
> a
> > Beam pipeline to run on e.g. Flink, Spark, Dataflow.
> >
> > Here's an example:
> >
> >     SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c'])
> >
> > With logical plan:
> >
> >     LogicalProject(EXPR$0=[$0])
> >       Uncollect
> >         LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')])
> >           LogicalValues(tuples=[[{ 0 }]])
> >
> > And the planner dumps "could not be implemented" when going for Beam's
> > calling convention. So I implement a rel & a rule.
> >
> > Then there's the corellated version exploding an array field from a
> table:
> >
> >     SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST
> > (main.f_stringArr) AS arrElems(f_string)
> >
> > With logical plan:
> >
> >     LogicalProject(f_int=[$0], f_string=[$2])
> >       LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> > requiredColumns=[{1}])
> >         BeamIOSourceRel(table=[[beam, main]])
> >         Uncollect
> >           LogicalProject(f_stringArr=[$cor0.f_stringArr_1])
> >             LogicalValues(tuples=[[{ 0 }]])
> >
> > I hacked something together to support this, too. I did not fully
> implement
> > Correlate; I would love to reject unsupported things in a meaningful
> way. I
> > would like to have confidence that there are not other permutations of
> > logical plans that we missed. For example for joins we match all joins
> and
> > translate them, then throw an error at a later stage.
> >
> > Incidentally, when I ran the decorrelation [1] it appeared to have no
> > effect. We probably want to implement it directly in Beam anyhow in this
> > case.
> >
> > Kenn
> >
> > [1]
> >
> >
> https://calcite.apache.org/apidocs/org/apache/calcite/sql2rel/SqlToRelConverter.html#decorrelate-org.apache.calcite.sql.SqlNode-org.apache.calcite.rel.RelNode-
> >
> > On Tue, May 22, 2018 at 6:39 PM Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> >
> > > For most queries, the only thing you should need to implement is a scan
> > and
> > > the rest can usually be implemented by Calcite. It would be good if you
> > > have a specific example of a query that fails.
> > >
> > > --
> > > Michael Mior
> > > mmior@xxxxxxxxxxxx
> > >
> > >
> > > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles <klk@xxxxxxxxxx.invalid>
> a
> > > écrit :
> > >
> > > > Bumping this, as it ended up in spam for some people.
> > > >
> > > > Kenn
> > > >
> > > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <klk@xxxxxxxxxx>
> > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > Beam SQL uses Calcite for parsing and (naive) planning. Currently
> it
> > is
> > > > > pretty easy to write a SQL query that parses and causes a "could
> not
> > > > plan"
> > > > > dump when we ask the planner to convert things to the Beam calling
> > > > > convention. One current example is using UNNEST on a column to
> yield
> > a
> > > > > LogicalCorrelate + Uncollect.
> > > > >
> > > > > There may obviously always be bits we don't support, but we'd like
> to
> > > > > ensure that the user encounters a careful error message rather
> than a
> > > > > planner dump. Is there a best practice for ensuring that we have
> > > covered
> > > > > all the cases? Is it just "everything name Logical*" or is there
> > > > something
> > > > > more clever?
> > > > >
> > > > > And if this question demonstrates that we are using Calcite
> entirely
> > > > > wrong, let us know :-)
> > > > >
> > > > > Kenn
> > > > >
> > > >
> > >
> >
>