git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Best practice for exhaustive planning


My advice: block the transformation to a particular convention. Then, if
you get cannot plan, example the plan to determine if there are specific
problematic patterns. If there are, do a best guess of the particular
reason and return to user. This covers situations additional situations
that wouldn't work in syntax scraping, such as when a user writes this
query:

select * from a,b
where a.id = b.id

In this case, with the correct rules, this will get planned. However, a SQL
scrape would have said this was an invalid cartesian join potentially.



On Wed, Jun 20, 2018 at 1:10 PM, Kenneth Knowles <klk@xxxxxxxxxx.invalid>
wrote:

> Hi all,
>
> Bumping this again because I'd like to be quite sure the answer is "Calcite
> doesn't support this". For example, I'd like to reject full cartesian
> joins. Currently, all joins can be converted to Beam convention and then
> there's some logic later to complain about cross joins. I would prefer to
> do this in the rule set, making a cross join just not convertible to Beam
> convention, to incentivize finding other plans, but still give a user a
> good error message.
>
> What do people actually do in this situation? Possibilities: (a) scrape the
> syntax before planning, missing opportunities where a transformation might
> end up with a viable plan (b) make an "ErrorRel" with impossibly high cost
> so it will only be chosen as the last resort, somewhat like yacc error
> productions, could be hard to get a decent error message. I don't like
> these options, particularly.
>
> Kenn
>
> On Wed, May 30, 2018 at 6:10 AM Michael Mior <mmior@xxxxxxxxxx> wrote:
>
> > Unfortunately, I'm not sure of the best way how to proceed from here, but
> > it seems like you're making progress :)
> > --
> > Michael Mior
> > mmior@xxxxxxxxxx
> >
> >
> >
> > Le mar. 29 mai 2018 à 18:29, Kenneth Knowles <klk@xxxxxxxxxx.invalid> a
> > écrit :
> >
> > > Thanks Michael,
> > >
> > > I don't think that applies in our case - we aren't doing a table scan
> and
> > > having Calcite implement the rest, but are translating the whole plan
> to
> > a
> > > Beam pipeline to run on e.g. Flink, Spark, Dataflow.
> > >
> > > Here's an example:
> > >
> > >     SELECT * FROM UNNEST (ARRAY ['a', 'b', 'c'])
> > >
> > > With logical plan:
> > >
> > >     LogicalProject(EXPR$0=[$0])
> > >       Uncollect
> > >         LogicalProject(EXPR$0=[ARRAY('a', 'b', 'c')])
> > >           LogicalValues(tuples=[[{ 0 }]])
> > >
> > > And the planner dumps "could not be implemented" when going for Beam's
> > > calling convention. So I implement a rel & a rule.
> > >
> > > Then there's the corellated version exploding an array field from a
> > table:
> > >
> > >     SELECT f_int, arrElems.f_string FROM main CROSS JOIN UNNEST
> > > (main.f_stringArr) AS arrElems(f_string)
> > >
> > > With logical plan:
> > >
> > >     LogicalProject(f_int=[$0], f_string=[$2])
> > >       LogicalCorrelate(correlation=[$cor0], joinType=[inner],
> > > requiredColumns=[{1}])
> > >         BeamIOSourceRel(table=[[beam, main]])
> > >         Uncollect
> > >           LogicalProject(f_stringArr=[$cor0.f_stringArr_1])
> > >             LogicalValues(tuples=[[{ 0 }]])
> > >
> > > I hacked something together to support this, too. I did not fully
> > implement
> > > Correlate; I would love to reject unsupported things in a meaningful
> > way. I
> > > would like to have confidence that there are not other permutations of
> > > logical plans that we missed. For example for joins we match all joins
> > and
> > > translate them, then throw an error at a later stage.
> > >
> > > Incidentally, when I ran the decorrelation [1] it appeared to have no
> > > effect. We probably want to implement it directly in Beam anyhow in
> this
> > > case.
> > >
> > > Kenn
> > >
> > > [1]
> > >
> > >
> > https://calcite.apache.org/apidocs/org/apache/calcite/
> sql2rel/SqlToRelConverter.html#decorrelate-org.apache.
> calcite.sql.SqlNode-org.apache.calcite.rel.RelNode-
> > >
> > > On Tue, May 22, 2018 at 6:39 PM Michael Mior <mmior@xxxxxxxxxxxx>
> wrote:
> > >
> > > > For most queries, the only thing you should need to implement is a
> scan
> > > and
> > > > the rest can usually be implemented by Calcite. It would be good if
> you
> > > > have a specific example of a query that fails.
> > > >
> > > > --
> > > > Michael Mior
> > > > mmior@xxxxxxxxxxxx
> > > >
> > > >
> > > > Le mar. 22 mai 2018 à 19:01, Kenneth Knowles <klk@xxxxxxxxxx.invalid
> >
> > a
> > > > écrit :
> > > >
> > > > > Bumping this, as it ended up in spam for some people.
> > > > >
> > > > > Kenn
> > > > >
> > > > > On Tue, May 15, 2018 at 9:26 AM Kenneth Knowles <klk@xxxxxxxxxx>
> > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > Beam SQL uses Calcite for parsing and (naive) planning. Currently
> > it
> > > is
> > > > > > pretty easy to write a SQL query that parses and causes a "could
> > not
> > > > > plan"
> > > > > > dump when we ask the planner to convert things to the Beam
> calling
> > > > > > convention. One current example is using UNNEST on a column to
> > yield
> > > a
> > > > > > LogicalCorrelate + Uncollect.
> > > > > >
> > > > > > There may obviously always be bits we don't support, but we'd
> like
> > to
> > > > > > ensure that the user encounters a careful error message rather
> > than a
> > > > > > planner dump. Is there a best practice for ensuring that we have
> > > > covered
> > > > > > all the cases? Is it just "everything name Logical*" or is there
> > > > > something
> > > > > > more clever?
> > > > > >
> > > > > > And if this question demonstrates that we are using Calcite
> > entirely
> > > > > > wrong, let us know :-)
> > > > > >
> > > > > > Kenn
> > > > > >
> > > > >
> > > >
> > >
> >
>


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-calcite-development/msg03864.html on line 244
Call Stack
#TimeMemoryFunctionLocation
10.0035358472{main}( ).../msg03864.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-calcite-development/msg03864.html on line 244
Call Stack
#TimeMemoryFunctionLocation
10.0035358472{main}( ).../msg03864.html:0