git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex Types Support in DDL


Hi Anton, thanks a lot for the great questions.

Yes, SqlDataTypeSpec currently only support creating simple SQL types, no
row/array/map is supported.

CALCITE-2045 adds support for defining custom either simple or row types
through the type DDL, and you should be able to use the UDT in your Table
DDL for complex row type. I think this should be close to what you want.

You can extend current type DDL in its current form in BEAM parser and add
support for map and array type, or modify the grammar to tailor your need
to make it BigQuery compatible. All the required change for supporting UDT
in calcite-core should be already done by CALCITE-2045.

As for the big query syntax, I am not sure if it's a good idea to adopt it
in core parser unless there is no SQL equivalent, but if you implement it
in your extended BEAM parser, it's up to you and that's by design of
Calcite DDL.

Let me know if it helps.

Thanks
Shuyi

On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <kedin@xxxxxxxxxx.invalid>
wrote:

> Hi,
>
> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache
> Beam SQL DDL (based on Calcite DDL extensions). What would be the best way
> to approach this?
>
> *Our Use Case:*
>   We want to be able to use DDL to define data sources and sinks for Beam
> pipelines, so that users don't have to wrap SQL into custom code which
> configures sources/sinks.
>
> *What we have already:*
>   We have a customized CREATE TABLE statement which allows users to specify
> the type of the data source, its schema, and data location. The
> implmentation is based on Calcite DDL extensions.
>
> *What we're missing:*
>   We need to be able to define schemas with non-primitive types, e.g.
> arrays or rows, so that we can correctly describe data sources and sinks
> which supports such types. For example if we want to manipulate data in a
> stream of JSON objects, we want to be able to describe the JSON contents
> somehow, including arrays or nested objects. Or we would need similar types
> to interact with BigQuery which supports arrays and nested struct types.
>
> *Problem:*
>   I tried to check if it is possible to extend the parser using the
> config.fmpp approach, so that we can hook into the Parser.TypeName()
> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0
> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4439>
> method and parse the complex types ourselves. But Parser.DataType()
> <https://github.com/apache/calcite/blob/a5d520df76602d25ed66627f08f5e0
> db4d048a77/core/src/main/codegen/templates/Parser.jj#L4377>
> creates
> SqlDataTypeSpec only in two specific ways, without ability to extend it, so
> even if we parse the typename ourselves, we would not be able to construct
> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we
> could, looking at SqlDataTypeSpec
> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e
> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> SqlDataTypeSpec.java#L327>
> it seems that it does not support creating arrays or rows as well: it calls
> typeFactory.createSqlType(typename)
> <https://github.com/apache/calcite/blob/09be7e74a6a4d1b1c4f640c8e69b5e
> bdd467d811/core/src/main/java/org/apache/calcite/sql/
> SqlDataTypeSpec.java#L350>
> which
> only
> <https://github.com/apache/calcite/blob/f47465236b7650f2280092b708fa39
> 062fe79ffd/core/src/main/java/org/apache/calcite/sql/type/
> SqlTypeFactoryImpl.java#L49>
> creates basic types in this call.
>
> *Path forward:*
>   It the above is correct, then it appears that we would need to patch
> Calcite in couple of places to support arrays, rows, and maps in DDL:
>     - update Parser.jj to support parsing the type definitions for the
> required types and constructing SqlDataTypeSpec correctly for those cases;
>     - update SqlDataTypeSpec.java to handle complex types and invoke
> correct typeFactory interfaces;
>
> *Questions:*
>  - does the above sound sane/correct?
>  - is there a similar work already tracked in Calcite somewhere? I saw
> something mentioned in CALCITE-2045
> <https://issues.apache.org/jira/browse/CALCITE-2045?
> focusedCommentId=16351203&page=com.atlassian.jira.
> plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16351203>,
> but didn't see any tracking Jiras specifically for this work yet;
>  - is there a known/recommended/working syntax for such DDL? If there is
> none, then would it make sense to adopt something similar to BigQuery
> STRUCT/ARRAY
> definition <https://cloud.google.com/bigquery/docs/data-definition-
> language>
> ?
>
> Thank you,
> Anton
>



-- 
"So you have to trust that the dots will somehow connect in your future."


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-calcite-development/msg03381.html on line 178
Call Stack
#TimeMemoryFunctionLocation
10.0007358472{main}( ).../msg03381.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-calcite-development/msg03381.html on line 178
Call Stack
#TimeMemoryFunctionLocation
10.0007358472{main}( ).../msg03381.html:0