git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex Types Support in DDL


That makes sense to me. I agree that it's probably not very useful to try
to share anything in the parser between calcite-server and calcite-babel
since calcite-babel will always be a moving target. However, given that
calcite-babel is intended to be particularly permissive, it would be great
to have a way to run calcite-server DDL tests against calcite-babel.

--
Michael Mior
mmior@xxxxxxxxxxxx


Le mer. 2 mai 2018 à 14:34, Shuyi Chen <suez1224@xxxxxxxxx> a écrit :

> Yes, that's what's in my mind as well. Server module is kinda of Calcite's
> DDL, people that use Calcite directly can just use server module for their
> DDL purpose. Other SQL dialect have their own DDL, and in order for them to
> leverage Calcite's relational algebra and query planning, the Babel parser
> need to be able to parse both DML and DDL of their own dialect. Would that
> be clear?
>
> On Wed, May 2, 2018 at 11:23 AM, Julian Hyde <jhyde@xxxxxxxxxx> wrote:
>
> > The principles are as follows:
> >  * Server should expose, as DDL, the concepts in Calcite’s framework, no
> > more, no less. This includes the ability to define a type if supported by
> > Calcite’s type system (RelDataTypeFactory), and the ability to define
> > materialized views and lattices.
> >  * Babel should expose anything in a supported SQL dialect (or rather,
> > anything that someone has found time to support).
> >
> > Server’s specification is relatively fixed, whereas Babel’s specification
> > is growing and changing all the time.
> >
> > Julian
> >
> >
> > > On May 2, 2018, at 10:06 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> > >
> > > Seems logical to me, although I wonder if there's any way we could
> easily
> > > make the DDL part of the parser modular. At least before going too far
> > down
> > > the road of implementing DDL in Babel, it would be good to set a clear
> > > scope of what will exist in calcite-babel vs. calcite-server.
> > >
> > > --
> > > Michael Mior
> > > mmior@xxxxxxxxxxxx <mailto:mmior@xxxxxxxxxxxx>
> > >
> > > 2018-05-02 12:57 GMT-04:00 Julian Hyde <jhyde@xxxxxxxxxx <mailto:
> > jhyde@xxxxxxxxxx>>:
> > >
> > >> By the way. We should also figure out how this fits with the project
> to
> > >> create a lenient parser that can handle any dialect of SQL. I am
> calling
> > >> that parser “Babel”[1]. That parser will be able to handle BigQuery
> > >> dialect, among others.
> > >>
> > >> Here’s my current thinking.
> > >>
> > >> I think that Babel should be a new module (a sibling to
> calcite-server,
> > >> calcite-druid etc.) and its parser will extend the core parser. That
> > means
> > >> that calcite-babel will not inherit from the DDL parser in the
> > >> calcite-server module, nor vice versa. We will probably end up with
> two
> > >> parsers that are capable of handling DDL, and two sets of AST classes.
> > But
> > >> I think that is OK, or at least, better than the chaos of trying to
> > reuse
> > >> too much. At least, the parsers will share 99% of their DNA with the
> > core
> > >> parser. And we can easily share tests.
> > >>
> > >> Julian
> > >>
> > >> [1] https://issues.apache.org/jira/browse/CALCITE-2280 <
> > >> https://issues.apache.org/jira/browse/CALCITE-2280 <
> > https://issues.apache.org/jira/browse/CALCITE-2280>>
> > >>
> > >>> On May 1, 2018, at 11:16 PM, Shuyi Chen <suez1224@xxxxxxxxx> wrote:
> > >>>
> > >>> Hi Anton, thanks a lot for the great questions.
> > >>>
> > >>> Yes, SqlDataTypeSpec currently only support creating simple SQL
> types,
> > no
> > >>> row/array/map is supported.
> > >>>
> > >>> CALCITE-2045 adds support for defining custom either simple or row
> > types
> > >>> through the type DDL, and you should be able to use the UDT in your
> > Table
> > >>> DDL for complex row type. I think this should be close to what you
> > want.
> > >>>
> > >>> You can extend current type DDL in its current form in BEAM parser
> and
> > >> add
> > >>> support for map and array type, or modify the grammar to tailor your
> > need
> > >>> to make it BigQuery compatible. All the required change for
> supporting
> > >> UDT
> > >>> in calcite-core should be already done by CALCITE-2045.
> > >>>
> > >>> As for the big query syntax, I am not sure if it's a good idea to
> adopt
> > >> it
> > >>> in core parser unless there is no SQL equivalent, but if you
> implement
> > it
> > >>> in your extended BEAM parser, it's up to you and that's by design of
> > >>> Calcite DDL.
> > >>>
> > >>> Let me know if it helps.
> > >>>
> > >>> Thanks
> > >>> Shuyi
> > >>>
> > >>> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <kedin@xxxxxxxxxx.invalid
> >
> > >>> wrote:
> > >>>
> > >>>> Hi,
> > >>>>
> > >>>> We want add support for non-primitive types (ROW, ARRAY, MAP) to
> > Apache
> > >>>> Beam SQL DDL (based on Calcite DDL extensions). What would be the
> best
> > >> way
> > >>>> to approach this?
> > >>>>
> > >>>> *Our Use Case:*
> > >>>> We want to be able to use DDL to define data sources and sinks for
> > Beam
> > >>>> pipelines, so that users don't have to wrap SQL into custom code
> which
> > >>>> configures sources/sinks.
> > >>>>
> > >>>> *What we have already:*
> > >>>> We have a customized CREATE TABLE statement which allows users to
> > >> specify
> > >>>> the type of the data source, its schema, and data location. The
> > >>>> implmentation is based on Calcite DDL extensions.
> > >>>>
> > >>>> *What we're missing:*
> > >>>> We need to be able to define schemas with non-primitive types, e.g.
> > >>>> arrays or rows, so that we can correctly describe data sources and
> > sinks
> > >>>> which supports such types. For example if we want to manipulate data
> > in
> > >> a
> > >>>> stream of JSON objects, we want to be able to describe the JSON
> > contents
> > >>>> somehow, including arrays or nested objects. Or we would need
> similar
> > >> types
> > >>>> to interact with BigQuery which supports arrays and nested struct
> > types.
> > >>>>
> > >>>> *Problem:*
> > >>>> I tried to check if it is possible to extend the parser using the
> > >>>> config.fmpp approach, so that we can hook into the Parser.TypeName()
> > >>>> <https://github.com/apache/calcite/blob/
> > a5d520df76602d25ed66627f08f5e0
> > >>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
> > >>>> method and parse the complex types ourselves. But Parser.DataType()
> > >>>> <https://github.com/apache/calcite/blob/
> > a5d520df76602d25ed66627f08f5e0
> > >>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
> > >>>> creates
> > >>>> SqlDataTypeSpec only in two specific ways, without ability to extend
> > >> it, so
> > >>>> even if we parse the typename ourselves, we would not be able to
> > >> construct
> > >>>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if
> we
> > >>>> could, looking at SqlDataTypeSpec
> > >>>> <https://github.com/apache/calcite/blob/
> > 09be7e74a6a4d1b1c4f640c8e69b5e
> > >>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> > >>>> SqlDataTypeSpec.java#L327>
> > >>>> it seems that it does not support creating arrays or rows as well:
> it
> > >> calls
> > >>>> typeFactory.createSqlType(typename)
> > >>>> <https://github.com/apache/calcite/blob/
> > 09be7e74a6a4d1b1c4f640c8e69b5e
> > >>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> > >>>> SqlDataTypeSpec.java#L350>
> > >>>> which
> > >>>> only
> > >>>> <https://github.com/apache/calcite/blob/
> > f47465236b7650f2280092b708fa39
> > >>>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
> > >>>> SqlTypeFactoryImpl.java#L49>
> > >>>> creates basic types in this call.
> > >>>>
> > >>>> *Path forward:*
> > >>>> It the above is correct, then it appears that we would need to patch
> > >>>> Calcite in couple of places to support arrays, rows, and maps in
> DDL:
> > >>>>   - update Parser.jj to support parsing the type definitions for the
> > >>>> required types and constructing SqlDataTypeSpec correctly for those
> > >> cases;
> > >>>>   - update SqlDataTypeSpec.java to handle complex types and invoke
> > >>>> correct typeFactory interfaces;
> > >>>>
> > >>>> *Questions:*
> > >>>> - does the above sound sane/correct?
> > >>>> - is there a similar work already tracked in Calcite somewhere? I
> saw
> > >>>> something mentioned in CALCITE-2045
> > >>>> <https://issues.apache.org/jira/browse/CALCITE-2045?
> > >>>> focusedCommentId=16351203&page=com.atlassian.jira.
> > >>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
> > >>>> but didn't see any tracking Jiras specifically for this work yet;
> > >>>> - is there a known/recommended/working syntax for such DDL? If there
> > is
> > >>>> none, then would it make sense to adopt something similar to
> BigQuery
> > >>>> STRUCT/ARRAY
> > >>>> definition <https://cloud.google.com/bigquery/docs/data-definition-
> > >>>> language>
> > >>>> ?
> > >>>>
> > >>>> Thank you,
> > >>>> Anton
> > >>>>
> > >>>
> > >>>
> > >>>
> > >>> --
> > >>> "So you have to trust that the dots will somehow connect in your
> > future."
> >
> >
>
>
> --
> "So you have to trust that the dots will somehow connect in your future."
>


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-calcite-development/msg03394.html on line 313
Call Stack
#TimeMemoryFunctionLocation
10.0009376952{main}( ).../msg03394.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-calcite-development/msg03394.html on line 313
Call Stack
#TimeMemoryFunctionLocation
10.0009376952{main}( ).../msg03394.html:0