Re: Complex Types Support in DDL
By the way. We should also figure out how this fits with the project to create a lenient parser that can handle any dialect of SQL. I am calling that parser “Babel”. That parser will be able to handle BigQuery dialect, among others.
Here’s my current thinking.
I think that Babel should be a new module (a sibling to calcite-server, calcite-druid etc.) and its parser will extend the core parser. That means that calcite-babel will not inherit from the DDL parser in the calcite-server module, nor vice versa. We will probably end up with two parsers that are capable of handling DDL, and two sets of AST classes. But I think that is OK, or at least, better than the chaos of trying to reuse too much. At least, the parsers will share 99% of their DNA with the core parser. And we can easily share tests.
 https://issues.apache.org/jira/browse/CALCITE-2280 <https://issues.apache.org/jira/browse/CALCITE-2280>
> On May 1, 2018, at 11:16 PM, Shuyi Chen <suez1224@xxxxxxxxx> wrote:
> Hi Anton, thanks a lot for the great questions.
> Yes, SqlDataTypeSpec currently only support creating simple SQL types, no
> row/array/map is supported.
> CALCITE-2045 adds support for defining custom either simple or row types
> through the type DDL, and you should be able to use the UDT in your Table
> DDL for complex row type. I think this should be close to what you want.
> You can extend current type DDL in its current form in BEAM parser and add
> support for map and array type, or modify the grammar to tailor your need
> to make it BigQuery compatible. All the required change for supporting UDT
> in calcite-core should be already done by CALCITE-2045.
> As for the big query syntax, I am not sure if it's a good idea to adopt it
> in core parser unless there is no SQL equivalent, but if you implement it
> in your extended BEAM parser, it's up to you and that's by design of
> Calcite DDL.
> Let me know if it helps.
> On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <email@example.com>
>> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache
>> Beam SQL DDL (based on Calcite DDL extensions). What would be the best way
>> to approach this?
>> *Our Use Case:*
>> We want to be able to use DDL to define data sources and sinks for Beam
>> pipelines, so that users don't have to wrap SQL into custom code which
>> configures sources/sinks.
>> *What we have already:*
>> We have a customized CREATE TABLE statement which allows users to specify
>> the type of the data source, its schema, and data location. The
>> implmentation is based on Calcite DDL extensions.
>> *What we're missing:*
>> We need to be able to define schemas with non-primitive types, e.g.
>> arrays or rows, so that we can correctly describe data sources and sinks
>> which supports such types. For example if we want to manipulate data in a
>> stream of JSON objects, we want to be able to describe the JSON contents
>> somehow, including arrays or nested objects. Or we would need similar types
>> to interact with BigQuery which supports arrays and nested struct types.
>> I tried to check if it is possible to extend the parser using the
>> config.fmpp approach, so that we can hook into the Parser.TypeName()
>> method and parse the complex types ourselves. But Parser.DataType()
>> SqlDataTypeSpec only in two specific ways, without ability to extend it, so
>> even if we parse the typename ourselves, we would not be able to construct
>> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we
>> could, looking at SqlDataTypeSpec
>> it seems that it does not support creating arrays or rows as well: it calls
>> creates basic types in this call.
>> *Path forward:*
>> It the above is correct, then it appears that we would need to patch
>> Calcite in couple of places to support arrays, rows, and maps in DDL:
>> - update Parser.jj to support parsing the type definitions for the
>> required types and constructing SqlDataTypeSpec correctly for those cases;
>> - update SqlDataTypeSpec.java to handle complex types and invoke
>> correct typeFactory interfaces;
>> - does the above sound sane/correct?
>> - is there a similar work already tracked in Calcite somewhere? I saw
>> something mentioned in CALCITE-2045
>> but didn't see any tracking Jiras specifically for this work yet;
>> - is there a known/recommended/working syntax for such DDL? If there is
>> none, then would it make sense to adopt something similar to BigQuery
>> definition <https://cloud.google.com/bigquery/docs/data-definition-
>> Thank you,
> "So you have to trust that the dots will somehow connect in your future."