git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex Types Support in DDL


Seems logical to me, although I wonder if there's any way we could easily
make the DDL part of the parser modular. At least before going too far down
the road of implementing DDL in Babel, it would be good to set a clear
scope of what will exist in calcite-babel vs. calcite-server.

--
Michael Mior
mmior@xxxxxxxxxxxx

2018-05-02 12:57 GMT-04:00 Julian Hyde <jhyde@xxxxxxxxxx>:

> By the way. We should also figure out how this fits with the project to
> create a lenient parser that can handle any dialect of SQL. I am calling
> that parser “Babel”[1]. That parser will be able to handle BigQuery
> dialect, among others.
>
> Here’s my current thinking.
>
> I think that Babel should be a new module (a sibling to calcite-server,
> calcite-druid etc.) and its parser will extend the core parser. That means
> that calcite-babel will not inherit from the DDL parser in the
> calcite-server module, nor vice versa. We will probably end up with two
> parsers that are capable of handling DDL, and two sets of AST classes. But
> I think that is OK, or at least, better than the chaos of trying to reuse
> too much. At least, the parsers will share 99% of their DNA with the core
> parser. And we can easily share tests.
>
> Julian
>
> [1] https://issues.apache.org/jira/browse/CALCITE-2280 <
> https://issues.apache.org/jira/browse/CALCITE-2280>
>
> > On May 1, 2018, at 11:16 PM, Shuyi Chen <suez1224@xxxxxxxxx> wrote:
> >
> > Hi Anton, thanks a lot for the great questions.
> >
> > Yes, SqlDataTypeSpec currently only support creating simple SQL types, no
> > row/array/map is supported.
> >
> > CALCITE-2045 adds support for defining custom either simple or row types
> > through the type DDL, and you should be able to use the UDT in your Table
> > DDL for complex row type. I think this should be close to what you want.
> >
> > You can extend current type DDL in its current form in BEAM parser and
> add
> > support for map and array type, or modify the grammar to tailor your need
> > to make it BigQuery compatible. All the required change for supporting
> UDT
> > in calcite-core should be already done by CALCITE-2045.
> >
> > As for the big query syntax, I am not sure if it's a good idea to adopt
> it
> > in core parser unless there is no SQL equivalent, but if you implement it
> > in your extended BEAM parser, it's up to you and that's by design of
> > Calcite DDL.
> >
> > Let me know if it helps.
> >
> > Thanks
> > Shuyi
> >
> > On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <kedin@xxxxxxxxxx.invalid>
> > wrote:
> >
> >> Hi,
> >>
> >> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache
> >> Beam SQL DDL (based on Calcite DDL extensions). What would be the best
> way
> >> to approach this?
> >>
> >> *Our Use Case:*
> >>  We want to be able to use DDL to define data sources and sinks for Beam
> >> pipelines, so that users don't have to wrap SQL into custom code which
> >> configures sources/sinks.
> >>
> >> *What we have already:*
> >>  We have a customized CREATE TABLE statement which allows users to
> specify
> >> the type of the data source, its schema, and data location. The
> >> implmentation is based on Calcite DDL extensions.
> >>
> >> *What we're missing:*
> >>  We need to be able to define schemas with non-primitive types, e.g.
> >> arrays or rows, so that we can correctly describe data sources and sinks
> >> which supports such types. For example if we want to manipulate data in
> a
> >> stream of JSON objects, we want to be able to describe the JSON contents
> >> somehow, including arrays or nested objects. Or we would need similar
> types
> >> to interact with BigQuery which supports arrays and nested struct types.
> >>
> >> *Problem:*
> >>  I tried to check if it is possible to extend the parser using the
> >> config.fmpp approach, so that we can hook into the Parser.TypeName()
> >> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0
> >> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
> >> method and parse the complex types ourselves. But Parser.DataType()
> >> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0
> >> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
> >> creates
> >> SqlDataTypeSpec only in two specific ways, without ability to extend
> it, so
> >> even if we parse the typename ourselves, we would not be able to
> construct
> >> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we
> >> could, looking at SqlDataTypeSpec
> >> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e
> >> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> >> SqlDataTypeSpec.java#L327>
> >> it seems that it does not support creating arrays or rows as well: it
> calls
> >> typeFactory.createSqlType(typename)
> >> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e
> >> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> >> SqlDataTypeSpec.java#L350>
> >> which
> >> only
> >> <https://github.com/apache/calcite/blob/f47465236b7650f2280092b708fa39
> >> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
> >> SqlTypeFactoryImpl.java#L49>
> >> creates basic types in this call.
> >>
> >> *Path forward:*
> >>  It the above is correct, then it appears that we would need to patch
> >> Calcite in couple of places to support arrays, rows, and maps in DDL:
> >>    - update Parser.jj to support parsing the type definitions for the
> >> required types and constructing SqlDataTypeSpec correctly for those
> cases;
> >>    - update SqlDataTypeSpec.java to handle complex types and invoke
> >> correct typeFactory interfaces;
> >>
> >> *Questions:*
> >> - does the above sound sane/correct?
> >> - is there a similar work already tracked in Calcite somewhere? I saw
> >> something mentioned in CALCITE-2045
> >> <https://issues.apache.org/jira/browse/CALCITE-2045?
> >> focusedCommentId=16351203&page=com.atlassian.jira.
> >> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
> >> but didn't see any tracking Jiras specifically for this work yet;
> >> - is there a known/recommended/working syntax for such DDL? If there is
> >> none, then would it make sense to adopt something similar to BigQuery
> >> STRUCT/ARRAY
> >> definition <https://cloud.google.com/bigquery/docs/data-definition-
> >> language>
> >> ?
> >>
> >> Thank you,
> >> Anton
> >>
> >
> >
> >
> > --
> > "So you have to trust that the dots will somehow connect in your future."
>
>


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-calcite-development/msg03389.html on line 225
Call Stack
#TimeMemoryFunctionLocation
10.0007372856{main}( ).../msg03389.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-calcite-development/msg03389.html on line 225
Call Stack
#TimeMemoryFunctionLocation
10.0007372856{main}( ).../msg03389.html:0