git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex Types Support in DDL


Thank you for the great advices and the feedback.

For my original question, I will evaluate what makes more sense in our case
based on suggestions in this thread. Forking the server probably makes more
sense for us at the moment, but I will look at create type as well.

Thank you,
Anton

On Wed, May 2, 2018 at 12:58 PM Julian Hyde <jhyde@xxxxxxxxxx> wrote:

> Agreed.
>
> Test re-use = specification re-use.
>
> Code re-use = much harder.
>
> > On May 2, 2018, at 12:38 PM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> >
> > That makes sense to me. I agree that it's probably not very useful to try
> > to share anything in the parser between calcite-server and calcite-babel
> > since calcite-babel will always be a moving target. However, given that
> > calcite-babel is intended to be particularly permissive, it would be
> great
> > to have a way to run calcite-server DDL tests against calcite-babel.
> >
> > --
> > Michael Mior
> > mmior@xxxxxxxxxxxx
> >
> >
> > Le mer. 2 mai 2018 à 14:34, Shuyi Chen <suez1224@xxxxxxxxx> a écrit :
> >
> >> Yes, that's what's in my mind as well. Server module is kinda of
> Calcite's
> >> DDL, people that use Calcite directly can just use server module for
> their
> >> DDL purpose. Other SQL dialect have their own DDL, and in order for
> them to
> >> leverage Calcite's relational algebra and query planning, the Babel
> parser
> >> need to be able to parse both DML and DDL of their own dialect. Would
> that
> >> be clear?
> >>
> >> On Wed, May 2, 2018 at 11:23 AM, Julian Hyde <jhyde@xxxxxxxxxx> wrote:
> >>
> >>> The principles are as follows:
> >>> * Server should expose, as DDL, the concepts in Calcite’s framework, no
> >>> more, no less. This includes the ability to define a type if supported
> by
> >>> Calcite’s type system (RelDataTypeFactory), and the ability to define
> >>> materialized views and lattices.
> >>> * Babel should expose anything in a supported SQL dialect (or rather,
> >>> anything that someone has found time to support).
> >>>
> >>> Server’s specification is relatively fixed, whereas Babel’s
> specification
> >>> is growing and changing all the time.
> >>>
> >>> Julian
> >>>
> >>>
> >>>> On May 2, 2018, at 10:06 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> >>>>
> >>>> Seems logical to me, although I wonder if there's any way we could
> >> easily
> >>>> make the DDL part of the parser modular. At least before going too far
> >>> down
> >>>> the road of implementing DDL in Babel, it would be good to set a clear
> >>>> scope of what will exist in calcite-babel vs. calcite-server.
> >>>>
> >>>> --
> >>>> Michael Mior
> >>>> mmior@xxxxxxxxxxxx <mailto:mmior@xxxxxxxxxxxx>
> >>>>
> >>>> 2018-05-02 12:57 GMT-04:00 Julian Hyde <jhyde@xxxxxxxxxx <mailto:
> >>> jhyde@xxxxxxxxxx>>:
> >>>>
> >>>>> By the way. We should also figure out how this fits with the project
> >> to
> >>>>> create a lenient parser that can handle any dialect of SQL. I am
> >> calling
> >>>>> that parser “Babel”[1]. That parser will be able to handle BigQuery
> >>>>> dialect, among others.
> >>>>>
> >>>>> Here’s my current thinking.
> >>>>>
> >>>>> I think that Babel should be a new module (a sibling to
> >> calcite-server,
> >>>>> calcite-druid etc.) and its parser will extend the core parser. That
> >>> means
> >>>>> that calcite-babel will not inherit from the DDL parser in the
> >>>>> calcite-server module, nor vice versa. We will probably end up with
> >> two
> >>>>> parsers that are capable of handling DDL, and two sets of AST
> classes.
> >>> But
> >>>>> I think that is OK, or at least, better than the chaos of trying to
> >>> reuse
> >>>>> too much. At least, the parsers will share 99% of their DNA with the
> >>> core
> >>>>> parser. And we can easily share tests.
> >>>>>
> >>>>> Julian
> >>>>>
> >>>>> [1] https://issues.apache.org/jira/browse/CALCITE-2280 <
> >>>>> https://issues.apache.org/jira/browse/CALCITE-2280 <
> >>> https://issues.apache.org/jira/browse/CALCITE-2280>>
> >>>>>
> >>>>>> On May 1, 2018, at 11:16 PM, Shuyi Chen <suez1224@xxxxxxxxx> wrote:
> >>>>>>
> >>>>>> Hi Anton, thanks a lot for the great questions.
> >>>>>>
> >>>>>> Yes, SqlDataTypeSpec currently only support creating simple SQL
> >> types,
> >>> no
> >>>>>> row/array/map is supported.
> >>>>>>
> >>>>>> CALCITE-2045 adds support for defining custom either simple or row
> >>> types
> >>>>>> through the type DDL, and you should be able to use the UDT in your
> >>> Table
> >>>>>> DDL for complex row type. I think this should be close to what you
> >>> want.
> >>>>>>
> >>>>>> You can extend current type DDL in its current form in BEAM parser
> >> and
> >>>>> add
> >>>>>> support for map and array type, or modify the grammar to tailor your
> >>> need
> >>>>>> to make it BigQuery compatible. All the required change for
> >> supporting
> >>>>> UDT
> >>>>>> in calcite-core should be already done by CALCITE-2045.
> >>>>>>
> >>>>>> As for the big query syntax, I am not sure if it's a good idea to
> >> adopt
> >>>>> it
> >>>>>> in core parser unless there is no SQL equivalent, but if you
> >> implement
> >>> it
> >>>>>> in your extended BEAM parser, it's up to you and that's by design of
> >>>>>> Calcite DDL.
> >>>>>>
> >>>>>> Let me know if it helps.
> >>>>>>
> >>>>>> Thanks
> >>>>>> Shuyi
> >>>>>>
> >>>>>> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin
> <kedin@xxxxxxxxxx.invalid
> >>>
> >>>>>> wrote:
> >>>>>>
> >>>>>>> Hi,
> >>>>>>>
> >>>>>>> We want add support for non-primitive types (ROW, ARRAY, MAP) to
> >>> Apache
> >>>>>>> Beam SQL DDL (based on Calcite DDL extensions). What would be the
> >> best
> >>>>> way
> >>>>>>> to approach this?
> >>>>>>>
> >>>>>>> *Our Use Case:*
> >>>>>>> We want to be able to use DDL to define data sources and sinks for
> >>> Beam
> >>>>>>> pipelines, so that users don't have to wrap SQL into custom code
> >> which
> >>>>>>> configures sources/sinks.
> >>>>>>>
> >>>>>>> *What we have already:*
> >>>>>>> We have a customized CREATE TABLE statement which allows users to
> >>>>> specify
> >>>>>>> the type of the data source, its schema, and data location. The
> >>>>>>> implmentation is based on Calcite DDL extensions.
> >>>>>>>
> >>>>>>> *What we're missing:*
> >>>>>>> We need to be able to define schemas with non-primitive types, e.g.
> >>>>>>> arrays or rows, so that we can correctly describe data sources and
> >>> sinks
> >>>>>>> which supports such types. For example if we want to manipulate
> data
> >>> in
> >>>>> a
> >>>>>>> stream of JSON objects, we want to be able to describe the JSON
> >>> contents
> >>>>>>> somehow, including arrays or nested objects. Or we would need
> >> similar
> >>>>> types
> >>>>>>> to interact with BigQuery which supports arrays and nested struct
> >>> types.
> >>>>>>>
> >>>>>>> *Problem:*
> >>>>>>> I tried to check if it is possible to extend the parser using the
> >>>>>>> config.fmpp approach, so that we can hook into the
> Parser.TypeName()
> >>>>>>> <https://github.com/apache/calcite/blob/
> >>> a5d520df76602d25ed66627f08f5e0
> >>>>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
> >>>>>>> method and parse the complex types ourselves. But Parser.DataType()
> >>>>>>> <https://github.com/apache/calcite/blob/
> >>> a5d520df76602d25ed66627f08f5e0
> >>>>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
> >>>>>>> creates
> >>>>>>> SqlDataTypeSpec only in two specific ways, without ability to
> extend
> >>>>> it, so
> >>>>>>> even if we parse the typename ourselves, we would not be able to
> >>>>> construct
> >>>>>>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if
> >> we
> >>>>>>> could, looking at SqlDataTypeSpec
> >>>>>>> <https://github.com/apache/calcite/blob/
> >>> 09be7e74a6a4d1b1c4f640c8e69b5e
> >>>>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> >>>>>>> SqlDataTypeSpec.java#L327>
> >>>>>>> it seems that it does not support creating arrays or rows as well:
> >> it
> >>>>> calls
> >>>>>>> typeFactory.createSqlType(typename)
> >>>>>>> <https://github.com/apache/calcite/blob/
> >>> 09be7e74a6a4d1b1c4f640c8e69b5e
> >>>>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> >>>>>>> SqlDataTypeSpec.java#L350>
> >>>>>>> which
> >>>>>>> only
> >>>>>>> <https://github.com/apache/calcite/blob/
> >>> f47465236b7650f2280092b708fa39
> >>>>>>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
> >>>>>>> SqlTypeFactoryImpl.java#L49>
> >>>>>>> creates basic types in this call.
> >>>>>>>
> >>>>>>> *Path forward:*
> >>>>>>> It the above is correct, then it appears that we would need to
> patch
> >>>>>>> Calcite in couple of places to support arrays, rows, and maps in
> >> DDL:
> >>>>>>>  - update Parser.jj to support parsing the type definitions for the
> >>>>>>> required types and constructing SqlDataTypeSpec correctly for those
> >>>>> cases;
> >>>>>>>  - update SqlDataTypeSpec.java to handle complex types and invoke
> >>>>>>> correct typeFactory interfaces;
> >>>>>>>
> >>>>>>> *Questions:*
> >>>>>>> - does the above sound sane/correct?
> >>>>>>> - is there a similar work already tracked in Calcite somewhere? I
> >> saw
> >>>>>>> something mentioned in CALCITE-2045
> >>>>>>> <https://issues.apache.org/jira/browse/CALCITE-2045?
> >>>>>>> focusedCommentId=16351203&page=com.atlassian.jira.
> >>>>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
> >>>>>>> but didn't see any tracking Jiras specifically for this work yet;
> >>>>>>> - is there a known/recommended/working syntax for such DDL? If
> there
> >>> is
> >>>>>>> none, then would it make sense to adopt something similar to
> >> BigQuery
> >>>>>>> STRUCT/ARRAY
> >>>>>>> definition <
> https://cloud.google.com/bigquery/docs/data-definition-
> >>>>>>> language>
> >>>>>>> ?
> >>>>>>>
> >>>>>>> Thank you,
> >>>>>>> Anton
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> --
> >>>>>> "So you have to trust that the dots will somehow connect in your
> >>> future."
> >>>
> >>>
> >>
> >>
> >> --
> >> "So you have to trust that the dots will somehow connect in your
> future."
> >>
>
>