[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex Types Support in DDL

By the way. We should also figure out how this fits with the project to create a lenient parser that can handle any dialect of SQL. I am calling that parser “Babel”[1]. That parser will be able to handle BigQuery dialect, among others.

Here’s my current thinking.

I think that Babel should be a new module (a sibling to calcite-server, calcite-druid etc.) and its parser will extend the core parser. That means that calcite-babel will not inherit from the DDL parser in the calcite-server module, nor vice versa. We will probably end up with two parsers that are capable of handling DDL, and two sets of AST classes. But I think that is OK, or at least, better than the chaos of trying to reuse too much. At least, the parsers will share 99% of their DNA with the core parser. And we can easily share tests.


[1] <>

> On May 1, 2018, at 11:16 PM, Shuyi Chen <suez1224@xxxxxxxxx> wrote:
> Hi Anton, thanks a lot for the great questions.
> Yes, SqlDataTypeSpec currently only support creating simple SQL types, no
> row/array/map is supported.
> CALCITE-2045 adds support for defining custom either simple or row types
> through the type DDL, and you should be able to use the UDT in your Table
> DDL for complex row type. I think this should be close to what you want.
> You can extend current type DDL in its current form in BEAM parser and add
> support for map and array type, or modify the grammar to tailor your need
> to make it BigQuery compatible. All the required change for supporting UDT
> in calcite-core should be already done by CALCITE-2045.
> As for the big query syntax, I am not sure if it's a good idea to adopt it
> in core parser unless there is no SQL equivalent, but if you implement it
> in your extended BEAM parser, it's up to you and that's by design of
> Calcite DDL.
> Let me know if it helps.
> Thanks
> Shuyi
> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <kedin@xxxxxxxxxx.invalid>
> wrote:
>> Hi,
>> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache
>> Beam SQL DDL (based on Calcite DDL extensions). What would be the best way
>> to approach this?
>> *Our Use Case:*
>>  We want to be able to use DDL to define data sources and sinks for Beam
>> pipelines, so that users don't have to wrap SQL into custom code which
>> configures sources/sinks.
>> *What we have already:*
>>  We have a customized CREATE TABLE statement which allows users to specify
>> the type of the data source, its schema, and data location. The
>> implmentation is based on Calcite DDL extensions.
>> *What we're missing:*
>>  We need to be able to define schemas with non-primitive types, e.g.
>> arrays or rows, so that we can correctly describe data sources and sinks
>> which supports such types. For example if we want to manipulate data in a
>> stream of JSON objects, we want to be able to describe the JSON contents
>> somehow, including arrays or nested objects. Or we would need similar types
>> to interact with BigQuery which supports arrays and nested struct types.
>> *Problem:*
>>  I tried to check if it is possible to extend the parser using the
>> config.fmpp approach, so that we can hook into the Parser.TypeName()
>> <
>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
>> method and parse the complex types ourselves. But Parser.DataType()
>> <
>> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
>> creates
>> SqlDataTypeSpec only in two specific ways, without ability to extend it, so
>> even if we parse the typename ourselves, we would not be able to construct
>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we
>> could, looking at SqlDataTypeSpec
>> <
>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>> it seems that it does not support creating arrays or rows as well: it calls
>> typeFactory.createSqlType(typename)
>> <
>> bdd467d811/core/src/main/java/org/apache/calcite/sql/
>> which
>> only
>> <
>> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
>> creates basic types in this call.
>> *Path forward:*
>>  It the above is correct, then it appears that we would need to patch
>> Calcite in couple of places to support arrays, rows, and maps in DDL:
>>    - update Parser.jj to support parsing the type definitions for the
>> required types and constructing SqlDataTypeSpec correctly for those cases;
>>    - update to handle complex types and invoke
>> correct typeFactory interfaces;
>> *Questions:*
>> - does the above sound sane/correct?
>> - is there a similar work already tracked in Calcite somewhere? I saw
>> something mentioned in CALCITE-2045
>> <
>> focusedCommentId=16351203&page=com.atlassian.jira.
>> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
>> but didn't see any tracking Jiras specifically for this work yet;
>> - is there a known/recommended/working syntax for such DDL? If there is
>> none, then would it make sense to adopt something similar to BigQuery
>> definition <
>> language>
>> ?
>> Thank you,
>> Anton
> -- 
> "So you have to trust that the dots will somehow connect in your future."