git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scrolling in ElasticSearch adapter


Hi Julian,

Scrolling (in elastic) does not only mean “open a cursor” but also iterate
over consistent snapshot. From docs:

The results that are returned from a scroll request reflect the state of
the index at the time that the initial search request was made, like a
snapshot in time. Subsequent changes to documents (index, update or delete)
will only affect later search requests.

So pagination (fetch / offset) can’t exactly replicate this functionality.

The problem with scrolling (in elastic) is that it is expensive and can’t
(shouldn’t) be enabled it by default.

There is one more “issue”. Currently select * from elastic returns at most
10 rows (in calcite). This is consistent with elastic behaviour which
limits result set to 10 documents (unless size is specified). When
returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
<https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
or SQL hint) does it mean return whole elastic index ? I’m not at ease with
returning different results based on hints or cursor settings.

Andrei.

On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jhyde.apache@xxxxxxxxx> wrote:

> It seems to me that Elasticsearch scroll means return a cursor - a
> collection of rows that you iterate over, and you may not read all of them.
> This is the default operation of JDBC.
>
> So, I guess we need to give the user a way to signal their intent to read
> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> hints[1] seem close to this. We would want the hints to be acted upon by
> both the optimizer and the JDBC transport.
>
> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> retrieve different pieces of a large result set in separate statements or
> (using query parameters) executions. It would be useful if the server could
> be given a hint to cache a statement across page requests.
>
> Julian
>
> [1]
> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
>
> > On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> christian.beikov@xxxxxxxxx> wrote:
> >
> > Hey,
> >
> > not sure if this should be an SQL keyword. JDBC specifies various
> constants that can be used at statement creation time:
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> >
> > Not sure though if or how these configurations are accessible for data
> stores or dialects, but IMO using these would be the proper way.
> >
> > Regards
> >
> > Christian
> >
> >> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> >> Hello,
> >>
> >>
> >> I was thinking about adding [scrolling functionality](
> >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> )
> >> to elastic search adapter. Since scrolling has non-negligible effect on
> the
> >> cluster it should be selectively enabled on per query basis. So, likely,
> >> user has to explicitly set "scroll flag" somewhere.
> >>
> >> Most natural way seems in SQL. [Calcite sql grammar](
> >> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> >> (unused to my knowledge). There were also discussions about adding
> hints to
> >> Calcite.
> >>
> >> ### Examples
> >> ```sql
> >> -- special sql keyword ?
> >> SCROLL select * from elastic;
> >>
> >> -- assuming hints are available in calcite
> >> /* HINT: scroll */ select * from elastic;
> >> ```
> >>
> >> What people think about this use-case ? Are there better ideas ?
> >>
> >> Regards,
> >> Andrei.
> >>
>