git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Elasticsearch Adapter. Removal of Mapping Types (by vendor). Index == Table


Maybe there could be a separator char as one of the adapter’s parameters. People should choose a value, say ‘$’ or ‘#’, that is legal in an unquoted SQL identifier but does not occur in any of their index or type names.

If not specified, the adapter would end up in a simple mode, say looking for indexes first, then looking for types, and people would need to make sure indexes and types have distinct names. After the transition to single-type indexes, people could stop using the parameter.

Julian


> On Jun 29, 2018, at 4:43 PM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
> 
> That's a valid point. Then user would define a different pattern like
> "i$index_t$type" for his cluster.
> 
> I think  we should first answer wherever such scenarios should be supported
> by calcite (given that they're already deprecated by the vendor). If yes,
> what should be collision strategy ? User defined pattern like above or
> failure or auto generated name ?
> 
> On Fri, Jun 29, 2018, 19:14 Julian Hyde <jhyde@xxxxxxxxxx> wrote:
> 
>>> In elastic (index/type) pair is guaranteed to be unique therefore
>>> "${index}_${type}" will be also unique (as string). This is only
>> necessary
>>> when we have several types per index. Valid question is wherever user
>>> should be allowed such flexibility.
>> 
>> Uniqueness is not my concern.
>> 
>> Suppose there is an index called "x_y" with a type called "z", and
>> another index called "x" with a type called "y_z". If I write "x_y_z"
>> it's not clear how it should be broken into index/type.
>> 
>> 
>> On Fri, Jun 29, 2018 at 3:15 PM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
>>>> Can you show how those examples affect SQL against the ES adapter and/or
>>> how they affect JSON models?
>>> 
>>> The discussion is how to properly bridge (index/type) concept from ES
>> into
>>> relational world. Proposal to use placeholders ($index / $type) affects
>>> only how table is named in calcite. They're not used as SQL literals. IE
>> it
>>> affects only configuration phase of the schema.
>>> Pretty much we're doing string/replace to derive table name from
>>> ($index/$type).
>>> 
>>>> You seem to be using '_' as a separator character. Are we sure that
>>>> people will never use it in index or type name? Separator characters
>>>> often cause problems.
>>> In elastic (index/type) pair is guaranteed to be unique therefore
>>> "${index}_${type}" will be also unique (as string). This is only
>> necessary
>>> when we have several types per index. Valid question is wherever user
>>> should be allowed such flexibility.
>>> 
>>> 
>>> 
>>> On Fri, Jun 29, 2018 at 2:19 PM Julian Hyde <jhyde@xxxxxxxxxx> wrote:
>>> 
>>>> Andrei,
>>>> 
>>>> I'm not an ES user so I don't fully understand this issue, but my two
>>>> cents anyway...
>>>> 
>>>> Can you show how those examples affect SQL against the ES adapter
>>>> and/or how they affect JSON models?
>>>> 
>>>> You seem to be using '_' as a separator character. Are we sure that
>>>> people will never use it in index or type name? Separator characters
>>>> often cause problems.
>>>> 
>>>> Julian
>>>> 
>>>> 
>>>> 
>>>> 
>>>> On Fri, Jun 29, 2018 at 10:58 AM, Andrei Sereda <andrei@xxxxxxxxx>
>> wrote:
>>>>> I agree there should be a configuration option. How about the
>> following
>>>>> approach.
>>>>> 
>>>>> Expose both variables ${index} and ${type} in configuration (JSON) and
>>>> user
>>>>> will use them to generate table name in calcite schema.
>>>>> 
>>>>> Example
>>>>> "table_name": "${type}" // current
>>>>> "table_name": "${index}" // new (default?)
>>>>> "table_name": "${index}_${type}" // most generic. supports multiple
>> types
>>>>> per index
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> On Fri, Jun 29, 2018 at 9:26 AM Michael Mior <mmior@xxxxxxxxxx>
>> wrote:
>>>>> 
>>>>>> I think it sounds like you and Andrei are in a good position to
>> tackle
>>>> this
>>>>>> one so I'm happy to have you both work on whatever solution you
>> think is
>>>>>> best.
>>>>>> 
>>>>>> --
>>>>>> Michael Mior
>>>>>> mmior@xxxxxxxxxx
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> Le ven. 29 juin 2018 à 04:19, Christian Beikov <
>>>> christian.beikov@xxxxxxxxx
>>>>>>> 
>>>>>> a écrit :
>>>>>> 
>>>>>>> IMO the best solution would be to make it configurable by
>> introducing
>>>> a
>>>>>>> "table_mapping" config with values
>>>>>>> 
>>>>>>>  * type - every type in the known indices is mapped as table
>>>>>>>  * index - every known index is mapped as table
>>>>>>> 
>>>>>>> We'd probably also need a "type_field" configuration for defining
>>>> which
>>>>>>> field to use for the type determination as one of the possible
>> future
>>>>>>> ways to do things is to introduce a custom field:
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html#_custom_type_field_2
>>>>>>> 
>>>>>>> We already detect the ES version, so we can set a smart default for
>>>> this
>>>>>>> setting. Let's make the index config param optional.
>>>>>>> 
>>>>>>>  * When no index is given, we discover indexes, the default for
>>>>>>>    "table_mapping" then is "index"
>>>>>>>  * When index is given, the we only discover types according to
>> the
>>>>>>>    "type_field" configuration and the default for "table_mapping"
>> is
>>>>>>> "type"
>>>>>>> 
>>>>>>> This would also allow to discover indexes but still use "type" as
>>>>>>> "table_mapping".
>>>>>>> 
>>>>>>> What do you think?
>>>>>>> 
>>>>>>> Mit freundlichen Grüßen,
>>>>>>> 
>>>> ------------------------------------------------------------------------
>>>>>>> *Christian Beikov*
>>>>>>> Am 29.06.2018 um 02:41 schrieb Andrei Sereda:
>>>>>>>> Yes. There is an API to list all indexes / types in elastic. They
>>>> can
>>>>>> be
>>>>>>>> automatically imported into a schema.
>>>>>>>> 
>>>>>>>> What needs to be agreed upon is how to expose those elements in
>>>> calcite
>>>>>>>> schema (naming / behaviour).
>>>>>>>> 
>>>>>>>> 1) Many (most?) of setups are single type per index. Natural way
>> to
>>>>>> name
>>>>>>>> would be  "elastic.$index" (elastic being schema name). Multiple
>>>>>> indexes
>>>>>>>> would be under same schema "elastic.index1" "elastic.index2" etc.
>>>>>>>> 
>>>>>>>> 2) What if index has several types should they exported as
>> calcite
>>>>>>> tables:
>>>>>>>> "elastic.$index_type1" "elastic.$index_type2" ?  Or (current
>>>> behaviour)
>>>>>>> as
>>>>>>>> "elastic.type1" and "elastic.type2". Or as subschema
>>>>>>>> "elastic.$index.type1" ?
>>>>>>>> 
>>>>>>>> Now what if one has combination of (1) and (2) ?
>>>>>>>> Setup (2) is already deprecated (and will be unsupported in next
>>>>>> version)
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Jun 28, 2018 at 7:31 PM Christian Beikov <
>>>>>>> christian.beikov@xxxxxxxxx>
>>>>>>>> wrote:
>>>>>>>> 
>>>>>>>>> Is there an API to discover indexes? If there is, I'd suggest we
>>>>>> allow a
>>>>>>>>> config option that to make the adapter discover the possible
>>>> indexes.
>>>>>>>>> We'd still have to adapt the code a bit, but internally, the
>> schema
>>>>>>>>> could just keep a cache of type name to index name map and be
>> able
>>>> to
>>>>>>>>> support both scenarios.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> Mit freundlichen Grüßen,
>>>>>>>>> 
>>>>>> 
>> ------------------------------------------------------------------------
>>>>>>>>> *Christian Beikov*
>>>>>>>>> Am 29.06.2018 um 00:12 schrieb Andrei Sereda:
>>>>>>>>>>> 1) What's the time horizon for the current adapter no longer
>>>> working
>>>>>>>>> with these
>>>>>>>>>> changes to ES ?
>>>>>>>>>> Current adapter will be working for a while with existing
>> setup.
>>>> The
>>>>>>>>>> problem is nomenclature and ease of use.
>>>>>>>>>> 
>>>>>>>>>> Their new SQL concepts mapping
>>>>>>>>>> <
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html
>>>>>>>>>> drops
>>>>>>>>>> the notion of ES type (which before was equivalent of RDBMS
>> table)
>>>>>> and
>>>>>>>>> uses
>>>>>>>>>> ES index as new table equivalent (before ES index was equal to
>>>>>>> database).
>>>>>>>>>> Most users use elastic this way (one type , one index) index ==
>>>>>> table.
>>>>>>>>>> 
>>>>>>>>>> Currently calcite requires schema per index. In RDBMS parlance
>>>>>> database
>>>>>>>>> per
>>>>>>>>>> table (I'd like to change that).
>>>>>>>>>> 
>>>>>>>>>>> 2) Any guess how complicated it would be to maintain code
>> paths
>>>> for
>>>>>>> both
>>>>>>>>>>> behaviours? I know this is probably really challenging to
>>>> estimate,
>>>>>>> but
>>>>>>>>> I
>>>>>>>>>>> really have no idea of the scope of these changes. Would it
>> mean
>>>> two
>>>>>>>>>>> different ES adapters?
>>>>>>>>>> One can have just a separate calcite schema implementations
>> (same
>>>>>>>>> adapter /
>>>>>>>>>> module) :
>>>>>>>>>> 1)  LegacySchema (old). Schema can have only one index (but
>>>> multiple
>>>>>>>>>> types). Type == table in this case.
>>>>>>>>>> 2)  NewSchema (new). Single schema can have multiple indexes
>>>> (type is
>>>>>>>>>> dropped). Index == table in this case
>>>>>>>>>> 
>>>>>>>>>>> 3) Do we really need compatibility with the current version of
>>>> the
>>>>>>>>>> adapter?
>>>>>>>>>>> IMO this depends on what versions of ES we would lose support
>> for
>>>>>> and
>>>>>>>>> how
>>>>>>>>>>> complex it would be for users of the current ES adapter to
>> make
>>>>>>> updates
>>>>>>>>>> for
>>>>>>>>>>> any Calcite API changes.
>>>>>>>>>> The issue is not in adapter but how calcite schema exposes
>> tables.
>>>>>>>>> Should
>>>>>>>>>> it expose index as individual table (new), or ES type (old) ?
>>>>>>>>>> 
>>>>>>>>>> Andrei.
>>>>>>>>>> 
>>>>>>>>>> On Thu, Jun 28, 2018 at 5:23 PM Michael Mior <mmior@xxxxxxxxxx
>>> 
>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> Unfortunately I know very little about ES so I'm not in a
>> great
>>>>>>>>> position to
>>>>>>>>>>> asses the impact of these changes. I will say that that legacy
>>>>>>>>>>> compatibility is great, but maintaining two sets of logic is
>>>> always
>>>>>> a
>>>>>>>>>>> challenge. A few follow up questions:
>>>>>>>>>>> 
>>>>>>>>>>> 1) What's the time horizon for the current adapter no longer
>>>> working
>>>>>>>>> with
>>>>>>>>>>> these changes to ES?
>>>>>>>>>>> 
>>>>>>>>>>> 2) Any guess how complicated it would be to maintain code
>> paths
>>>> for
>>>>>>> both
>>>>>>>>>>> behaviours? I know this is probably really challenging to
>>>> estimate,
>>>>>>> but
>>>>>>>>> I
>>>>>>>>>>> really have no idea of the scope of these changes. Would it
>> mean
>>>> two
>>>>>>>>>>> different ES adapters?
>>>>>>>>>>> 
>>>>>>>>>>> 3) Do we really need compatibility with the current version of
>>>> the
>>>>>>>>> adapter?
>>>>>>>>>>> IMO this depends on what versions of ES we would lose support
>> for
>>>>>> and
>>>>>>>>> how
>>>>>>>>>>> complex it would be for users of the current ES adapter to
>> make
>>>>>>> updates
>>>>>>>>> for
>>>>>>>>>>> any Calcite API changes.
>>>>>>>>>>> 
>>>>>>>>>>> Thanks for your continued work on the ES adapter Andrei!
>>>>>>>>>>> 
>>>>>>>>>>> --
>>>>>>>>>>> Michael Mior
>>>>>>>>>>> mmior@xxxxxxxxxx
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> Le jeu. 28 juin 2018 à 12:57, Andrei Sereda <andrei@xxxxxxxxx>
>> a
>>>>>>> écrit
>>>>>>>>> :
>>>>>>>>>>>> Hello,
>>>>>>>>>>>> 
>>>>>>>>>>>> Elastic announced
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/master/removal-of-types.html
>>>>>>>>>>>> that they will be deprecating mapping types in ES6 and
>> indexes
>>>> will
>>>>>>> be
>>>>>>>>>>>> single-typed only.
>>>>>>>>>>>> 
>>>>>>>>>>>> Historical analogy <
>> https://www.elastic.co/blog/index-vs-type>
>>>>>>> between
>>>>>>>>>>>> RDBMS and elastic was that index is equivalent to a database
>> and
>>>>>> type
>>>>>>>>>>>> corresponds to table in that database. In a couple of
>> releases
>>>>>>> (ES6-8)
>>>>>>>>>>> this
>>>>>>>>>>>> shall not longer be true.
>>>>>>>>>>>> 
>>>>>>>>>>>> Recent SQL addition
>>>>>>>>>>>> <https://www.elastic.co/blog/elasticsearch-6-3-0-released>
>> to
>>>>>>> elastic
>>>>>>>>>>>> confirms
>>>>>>>>>>>> this trend
>>>>>>>>>>>> <
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/_mapping_concepts_across_sql_and_elasticsearch.html
>>>>>>>>>>>>> .
>>>>>>>>>>>> Index is equivalent to a table and there are no more ES
>> types.
>>>>>>>>>>>> 
>>>>>>>>>>>> I would like to propose to include this logic in Calcite ES
>>>>>> adapter.
>>>>>>>>> IE,
>>>>>>>>>>>> expose each ES single-typed index as a separate table inside
>>>>>> calcite
>>>>>>>>>>>> schema. This is in contrast to  current integration where
>> schema
>>>>>> can
>>>>>>>>> only
>>>>>>>>>>>> have a single index. Current approach forces you to create
>>>> multiple
>>>>>>>>>>> schemas
>>>>>>>>>>>> to query single-typed indexes (on the same ES cluster).
>>>>>>>>>>>> 
>>>>>>>>>>>> Legacy compatibility can always be controlled with
>> configuration
>>>>>>>>>>>> parameters.
>>>>>>>>>>>> 
>>>>>>>>>>>> Do you agree with such changes ? If yes, would you consider a
>>>> PR ?
>>>>>>>>>>>> 
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Andrei.
>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>