[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex Types Support in DDL

The principles are as follows:
 * Server should expose, as DDL, the concepts in Calcite’s framework, no more, no less. This includes the ability to define a type if supported by Calcite’s type system (RelDataTypeFactory), and the ability to define materialized views and lattices.
 * Babel should expose anything in a supported SQL dialect (or rather, anything that someone has found time to support).

Server’s specification is relatively fixed, whereas Babel’s specification is growing and changing all the time. 


> On May 2, 2018, at 10:06 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> Seems logical to me, although I wonder if there's any way we could easily
> make the DDL part of the parser modular. At least before going too far down
> the road of implementing DDL in Babel, it would be good to set a clear
> scope of what will exist in calcite-babel vs. calcite-server.
> --
> Michael Mior
> mmior@xxxxxxxxxxxx <mailto:mmior@xxxxxxxxxxxx>
> 2018-05-02 12:57 GMT-04:00 Julian Hyde <jhyde@xxxxxxxxxx <mailto:jhyde@xxxxxxxxxx>>:
>> By the way. We should also figure out how this fits with the project to
>> create a lenient parser that can handle any dialect of SQL. I am calling
>> that parser “Babel”[1]. That parser will be able to handle BigQuery
>> dialect, among others.
>> Here’s my current thinking.
>> I think that Babel should be a new module (a sibling to calcite-server,
>> calcite-druid etc.) and its parser will extend the core parser. That means
>> that calcite-babel will not inherit from the DDL parser in the
>> calcite-server module, nor vice versa. We will probably end up with two
>> parsers that are capable of handling DDL, and two sets of AST classes. But
>> I think that is OK, or at least, better than the chaos of trying to reuse
>> too much. At least, the parsers will share 99% of their DNA with the core
>> parser. And we can easily share tests.
>> Julian
>> [1] <
>> <>>
>>> On May 1, 2018, at 11:16 PM, Shuyi Chen <suez1224@xxxxxxxxx> wrote:
>>> Hi Anton, thanks a lot for the great questions.
>>> Yes, SqlDataTypeSpec currently only support creating simple SQL types, no
>>> row/array/map is supported.
>>> CALCITE-2045 adds support for defining custom either simple or row types
>>> through the type DDL, and you should be able to use the UDT in your Table
>>> DDL for complex row type. I think this should be close to what you want.
>>> You can extend current type DDL in its current form in BEAM parser and
>> add
>>> support for map and array type, or modify the grammar to tailor your need
>>> to make it BigQuery compatible. All the required change for supporting
>> UDT
>>> in calcite-core should be already done by CALCITE-2045.
>>> As for the big query syntax, I am not sure if it's a good idea to adopt
>> it
>>> in core parser unless there is no SQL equivalent, but if you implement it
>>> in your extended BEAM parser, it's up to you and that's by design of
>>> Calcite DDL.
>>> Let me know if it helps.
>>> Thanks
>>> Shuyi
>>> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <kedin@xxxxxxxxxx.invalid>
>>> wrote:
>>>> Hi,
>>>> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache
>>>> Beam SQL DDL (based on Calcite DDL extensions). What would be the best
>> way
>>>> to approach this?
>>>> *Our Use Case:*
>>>> We want to be able to use DDL to define data sources and sinks for Beam
>>>> pipelines, so that users don't have to wrap SQL into custom code which
>>>> configures sources/sinks.
>>>> *What we have already:*
>>>> We have a customized CREATE TABLE statement which allows users to
>> specify
>>>> the type of the data source, its schema, and data location. The
>>>> implmentation is based on Calcite DDL extensions.
>>>> *What we're missing:*
>>>> We need to be able to define schemas with non-primitive types, e.g.
>>>> arrays or rows, so that we can correctly describe data sources and sinks
>>>> which supports such types. For example if we want to manipulate data in
>> a
>>>> stream of JSON objects, we want to be able to describe the JSON contents
>>>> somehow, including arrays or nested objects. Or we would need similar
>> types
>>>> to interact with BigQuery which supports arrays and nested struct types.
>>>> *Problem:*
>>>> I tried to check if it is possible to extend the parser using the
>>>> config.fmpp approach, so that we can hook into the Parser.TypeName()
>>>> <
>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
>>>> method and parse the complex types ourselves. But Parser.DataType()
>>>> <
>>>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
>>>> creates
>>>> SqlDataTypeSpec only in two specific ways, without ability to extend
>> it, so
>>>> even if we parse the typename ourselves, we would not be able to
>> construct
>>>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we
>>>> could, looking at SqlDataTypeSpec
>>>> <
>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>>>> it seems that it does not support creating arrays or rows as well: it
>> calls
>>>> typeFactory.createSqlType(typename)
>>>> <
>>>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>>>> which
>>>> only
>>>> <
>>>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
>>>> creates basic types in this call.
>>>> *Path forward:*
>>>> It the above is correct, then it appears that we would need to patch
>>>> Calcite in couple of places to support arrays, rows, and maps in DDL:
>>>>   - update Parser.jj to support parsing the type definitions for the
>>>> required types and constructing SqlDataTypeSpec correctly for those
>> cases;
>>>>   - update to handle complex types and invoke
>>>> correct typeFactory interfaces;
>>>> *Questions:*
>>>> - does the above sound sane/correct?
>>>> - is there a similar work already tracked in Calcite somewhere? I saw
>>>> something mentioned in CALCITE-2045
>>>> <
>>>> focusedCommentId=16351203&page=com.atlassian.jira.
>>>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
>>>> but didn't see any tracking Jiras specifically for this work yet;
>>>> - is there a known/recommended/working syntax for such DDL? If there is
>>>> none, then would it make sense to adopt something similar to BigQuery
>>>> definition <
>>>> language>
>>>> ?
>>>> Thank you,
>>>> Anton
>>> --
>>> "So you have to trust that the dots will somehow connect in your future."