git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)


I added comments to the case. I did not look at the PR, sorry.

> On Dec 26, 2018, at 1:56 PM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
> 
> Hello,
> 
> Can somebody pls take a look at the following PR:
> 
> PR: https://github.com/apache/calcite/pull/982
> JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> 
> Specifically, I would like to understand if adding explicit mapping
> <https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75>
> between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a correct
> (and recommended) approach. The reason I need it is because the original
> item name (eg. `_MAP['foo.bar']`) is lost after converter operation (ie I'm
> not able to derive it from `input.getRowType()`)
> 
> ```sql
> -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and 'foo.bar'
> select _MAP['_id'], _MAP['foo.bar'] from elastic
> 
> -- Query below preserves attribute information
> select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> \"foo.bar\" from elastic)
> ```
> 
> Many thanks,
> Andrei.
> 
> 
> 
> 
> On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <zabetak@xxxxxxxxx>
> wrote:
> 
>> Hi Andrei,
>> 
>> Many data management systems have internal columns which the users may be
>> able to query (see for example Postgres [1]). It doesn't sound unnatural to
>> allow users access these fields. On the other hand such system columns do
>> not appear in SELECT * queries and I suppose they are pruned out by
>> projections early on if they are not used by the query. I guess you could
>> take a similar approach in the ElasticSearch adapter.
>> 
>> Best,
>> Stamatis
>> 
>> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
>> 
>> 
>> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda <andrei@xxxxxxxxx>
>> έγραψε:
>> 
>>> Hello,
>>> 
>>> Upon indexing, elastic allocates a unique _id
>>> <
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
>>>> 
>>> for each document (unless specified by the user). Currently when mapping
>>> between elastic result and calcite return type we query only _source or
>>> fields attributes. _id is not part those but exposed at higher level in
>>> query response (see SearchHit
>>> <
>>> 
>> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
>>>> 
>>> ).
>>> 
>>> Currently  _MAP['foo'] becomes _source.get('foo') (or fields.get('foo')).
>>> 
>>> Do you think we should make _MAP['_id'] available implicitly ?
>>> 
>>> I have a couple of use-cases where one needs to know document ID.
>>> 
>>> Please note this makes sense only for non-aggregate queries.
>>> 
>>> Regards,
>>> 
>>> Andrei.
>>> 
>>