git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: ElasticSearch adapter. Exposing meta fields (like _id, _uid etc.)


Other than that it seems to me _MAP[‘_id’] will become a RexCall similar to
ITEM($0, ‘_ID’) so it appears to me that you can retrieve the field name by
implementing a RexVisitor.

Yes. That’s what I’m doing (see isItem
<https://github.com/apache/calcite/pull/982/files#diff-d1419ede28af115c0b45d6e02828cb6fR85>
)

I don’t see clearly why you need to set explicitly aliases but probably is
because I am missing implementation details related with the adapter itself.

For physical plan I get a list of expressions / projections (via RelDataType
/ getRowType) but at some point I need to know if (for example) EXPR$0 ==
ITEM($0, '_id') (_id is a system field and needs to be fetched differently
from payload). RelDataType doesn’t have item information.

That’s the only reason I’m manually saving this mapping. Perhaps there is a
better way to do it ?

On Fri, Dec 28, 2018 at 2:08 AM Stamatis Zampetakis <zabetak@xxxxxxxxx>
wrote:

> Hi Andrei,
>
> I tried to have a look in the PR but I am not familiar with the Elastic
> adapter code so I am missing a lot of context.
>
> In general, I think that relying on the on the field names coming from a
> RelDataType it is not safe. As you observed, it can get messed up quite
> easily from many parts of Calcite.
> It is usually better to try to reason using the position and/or the type of
> the field rather than its name.
>
> Other than that it seems to me _MAP['_id'] will become a RexCall similar to
> ITEM($0, '_ID') so it appears to me that you can retrieve the field name by
> implementing a RexVisitor.
> I don't see clearly why you need to set explicitly aliases but probably is
> because I am missing implementation details related with the adapter
> itself.
>
> Best,
> Stamatis
>
> Στις Πέμ, 27 Δεκ 2018 στις 8:36 μ.μ., ο/η Julian Hyde <jhyde@xxxxxxxxxx>
> έγραψε:
>
> > I added comments to the case. I did not look at the PR, sorry.
> >
> > > On Dec 26, 2018, at 1:56 PM, Andrei Sereda <andrei@xxxxxxxxx> wrote:
> > >
> > > Hello,
> > >
> > > Can somebody pls take a look at the following PR:
> > >
> > > PR: https://github.com/apache/calcite/pull/982
> > > JIRA: https://issues.apache.org/jira/browse/CALCITE-2755
> > >
> > > Specifically, I would like to understand if adding explicit mapping
> > > <
> >
> https://github.com/apache/calcite/pull/982/files#diff-f24822f58a8062386da5287cd57cae65R75
> > >
> > > between expression (`EXPR$n`) and item (`_MAP['foo.bar']`) is a correct
> > > (and recommended) approach. The reason I need it is because the
> original
> > > item name (eg. `_MAP['foo.bar']`) is lost after converter operation (ie
> > I'm
> > > not able to derive it from `input.getRowType()`)
> > >
> > > ```sql
> > > -- Here I get EXPR$0 and EXPR$1 but no information about 'id' and
> > 'foo.bar'
> > > select _MAP['_id'], _MAP['foo.bar'] from elastic
> > >
> > > -- Query below preserves attribute information
> > > select * from (select _MAP['_id'] as \"_id\", _MAP['foo.bar'] as
> > > \"foo.bar\" from elastic)
> > > ```
> > >
> > > Many thanks,
> > > Andrei.
> > >
> > >
> > >
> > >
> > > On Sat, Dec 22, 2018 at 3:11 AM Stamatis Zampetakis <zabetak@xxxxxxxxx
> >
> > > wrote:
> > >
> > >> Hi Andrei,
> > >>
> > >> Many data management systems have internal columns which the users may
> > be
> > >> able to query (see for example Postgres [1]). It doesn't sound
> > unnatural to
> > >> allow users access these fields. On the other hand such system columns
> > do
> > >> not appear in SELECT * queries and I suppose they are pruned out by
> > >> projections early on if they are not used by the query. I guess you
> > could
> > >> take a similar approach in the ElasticSearch adapter.
> > >>
> > >> Best,
> > >> Stamatis
> > >>
> > >> [1] https://www.postgresql.org/docs/current/ddl-system-columns.html
> > >>
> > >>
> > >> Στις Σάβ, 22 Δεκ 2018 στις 1:14 π.μ., ο/η Andrei Sereda
> > <andrei@xxxxxxxxx>
> > >> έγραψε:
> > >>
> > >>> Hello,
> > >>>
> > >>> Upon indexing, elastic allocates a unique _id
> > >>> <
> > >>>
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-id-field.html
> > >>>>
> > >>> for each document (unless specified by the user). Currently when
> > mapping
> > >>> between elastic result and calcite return type we query only _source
> or
> > >>> fields attributes. _id is not part those but exposed at higher level
> in
> > >>> query response (see SearchHit
> > >>> <
> > >>>
> > >>
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/6.1/_the_search_api.html
> > >>>>
> > >>> ).
> > >>>
> > >>> Currently  _MAP['foo'] becomes _source.get('foo') (or
> > fields.get('foo')).
> > >>>
> > >>> Do you think we should make _MAP['_id'] available implicitly ?
> > >>>
> > >>> I have a couple of use-cases where one needs to know document ID.
> > >>>
> > >>> Please note this makes sense only for non-aggregate queries.
> > >>>
> > >>> Regards,
> > >>>
> > >>> Andrei.
> > >>>
> > >>
> >
> >
>