[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Complex Types Support in DDL

The CREATE TYPE statement that Shuyi added in is probably fairly
close to what you want.

But if it isn't exactly what you want, that raises the question of how
we modify the DDL parser. I anticipated that each system's DDL
language will be significantly different from every other system's DDL
language. I may be mistaken - we should discuss - but if I am correct
then the template mechanism (config.fmpp etc.) used by the core query
parser will not work for DDL. The core DDL parser would become an
unmaintainable mess.

Also, I don't want to find myself arguing what should be the DDL
syntax of your project. You don't like the "IF EXISTS" part of "DROP
TABLE IF EXISTS foo"? Fine, leave it out of your grammar.

I propose that you use copy-paste. Create your own version of the
"server" module. If yours stays fairly close to ours, we should be
able to apply patches (or at the very least, test cases) in both

Now, if what you are thinking to add are standard, or widely
supported, extensions, we would consider putting them in the "server"
parser. An ARRAY type constructor seems to meet those criteria. (In
fact, as a type constructor, it belongs in the core language, not DDL,
because it could be used in CAST, among other things. So maybe you
already have what you need.)


On Tue, May 1, 2018 at 3:21 PM, Anton Kedin <kedin@xxxxxxxxxx.invalid> wrote:
> Hi,
> We want add support for non-primitive types (ROW, ARRAY, MAP) to Apache
> Beam SQL DDL (based on Calcite DDL extensions). What would be the best way
> to approach this?
> *Our Use Case:*
>   We want to be able to use DDL to define data sources and sinks for Beam
> pipelines, so that users don't have to wrap SQL into custom code which
> configures sources/sinks.
> *What we have already:*
>   We have a customized CREATE TABLE statement which allows users to specify
> the type of the data source, its schema, and data location. The
> implmentation is based on Calcite DDL extensions.
> *What we're missing:*
>   We need to be able to define schemas with non-primitive types, e.g.
> arrays or rows, so that we can correctly describe data sources and sinks
> which supports such types. For example if we want to manipulate data in a
> stream of JSON objects, we want to be able to describe the JSON contents
> somehow, including arrays or nested objects. Or we would need similar types
> to interact with BigQuery which supports arrays and nested struct types.
> *Problem:*
>   I tried to check if it is possible to extend the parser using the
> config.fmpp approach, so that we can hook into the Parser.TypeName()
> <>
> method and parse the complex types ourselves. But Parser.DataType()
> <>
> creates
> SqlDataTypeSpec only in two specific ways, without ability to extend it, so
> even if we parse the typename ourselves, we would not be able to construct
> the SqlDataTypeSpec in a way that supports arrays/rows. But even if we
> could, looking at SqlDataTypeSpec
> <>
> it seems that it does not support creating arrays or rows as well: it calls
> typeFactory.createSqlType(typename)
> <>
> which
> only
> <>
> creates basic types in this call.
> *Path forward:*
>   It the above is correct, then it appears that we would need to patch
> Calcite in couple of places to support arrays, rows, and maps in DDL:
>     - update Parser.jj to support parsing the type definitions for the
> required types and constructing SqlDataTypeSpec correctly for those cases;
>     - update to handle complex types and invoke
> correct typeFactory interfaces;
> *Questions:*
>  - does the above sound sane/correct?
>  - is there a similar work already tracked in Calcite somewhere? I saw
> something mentioned in CALCITE-2045
> <>,
> but didn't see any tracking Jiras specifically for this work yet;
>  - is there a known/recommended/working syntax for such DDL? If there is
> none, then would it make sense to adopt something similar to BigQuery
> definition <>
> ?
> Thank you,
> Anton