git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scrolling in ElasticSearch adapter


> It seems to me that this behavior is tight with the notion of a transaction and the various isolation levels defined by the standard. If the transaction isolation level is for example "serializable" then executing the same query in the same transaction multiple times should not return different results (no dirty/phantom/repeatable reads). So for serializable transactions consistent snapshot should be the default .I think.

Yes it is closely related to transactions. Do you suggest enabling
scrolling based on [JDBC
transactions](https://docs.oracle.com/javase/tutorial/jdbc/basics/transactions.html)
? If so,
elastic doesn't natively supports transactions. Scrollable results are
"serializable" only for single query not for multiple (which might
confuse users using transactions on the client side).


> I would say that this is a bug. If the SQL does not specify a limit then the result set should bring back all documents (possibly in multiple batches depending on the fetch size).

I tend to agree, but pagination doesn't work for all ES aggregations
(only [for terms](https://www.elastic.co/guide/en/elasticsearch/reference/5.5/search-aggregations-bucket-terms-aggregation.html#_filtering_values_with_partitions)).
On the top of that scrolling doesn't work with aggregations.

> Hints and cursors should not return different results.

Agree

On Wed, Oct 24, 2018 at 6:16 PM Stamatis Zampetakis <zabetak@xxxxxxxxx> wrote:
>
> Hi Andrei,
>
> Andrei>Scrolling (in elastic) does not only mean “open a cursor” but also
> iterate
> over consistent snapshot...
>
> It seems to me that this behavior is tight with the notion of a transaction
> and the various isolation levels defined by the standard. If the
> transaction isolation level is  for example "serializable" then executing
> the same query in the same transaction multiple times should not return
> different results (no dirty/phantom/repeatable reads).
> So for serializable transactions consistent snapshot should be the default
> I think.
>
> Andrei>There is one more “issue”. Currently select * from elastic returns
> at most
> 10 rows (in calcite)...
>
> I would say that this is a bug. If the SQL does not specify a limit then
> the result set should bring back all documents (possibly in multiple
> batches depending on the fetch size).
>
> Andrei> I’m not at ease with returning different results based on hints or
> cursor settings.
>
> Hints and cursors should not return different results. They only affect how
> the results are obtained from the server and delivered to the client. The
> client may decide to iterate through all/some of them but that does not
> change the fact that the server must be able to provide the complete result.
>
> Best,
> Stamatis
>
> Στις Τετ, 24 Οκτ 2018 στις 9:39 μ.μ., ο/η Andrei Sereda <andrei@xxxxxxxxx>
> έγραψε:
>
> > Hi Julian,
> >
> > Scrolling (in elastic) does not only mean “open a cursor” but also iterate
> > over consistent snapshot. From docs:
> >
> > The results that are returned from a scroll request reflect the state of
> > the index at the time that the initial search request was made, like a
> > snapshot in time. Subsequent changes to documents (index, update or delete)
> > will only affect later search requests.
> >
> > So pagination (fetch / offset) can’t exactly replicate this functionality.
> >
> > The problem with scrolling (in elastic) is that it is expensive and can’t
> > (shouldn’t) be enabled it by default.
> >
> > There is one more “issue”. Currently select * from elastic returns at most
> > 10 rows (in calcite). This is consistent with elastic behaviour which
> > limits result set to 10 documents (unless size is specified). When
> > returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > <
> > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> > >
> > or SQL hint) does it mean return whole elastic index ? I’m not at ease with
> > returning different results based on hints or cursor settings.
> >
> > Andrei.
> >
> > On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jhyde.apache@xxxxxxxxx>
> > wrote:
> >
> > > It seems to me that Elasticsearch scroll means return a cursor - a
> > > collection of rows that you iterate over, and you may not read all of
> > them.
> > > This is the default operation of JDBC.
> > >
> > > So, I guess we need to give the user a way to signal their intent to read
> > > all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> > > hints[1] seem close to this. We would want the hints to be acted upon by
> > > both the optimizer and the JDBC transport.
> > >
> > > Related is pagination. SQL has FETCH and OFFSET, which allow you to
> > > retrieve different pieces of a large result set in separate statements or
> > > (using query parameters) executions. It would be useful if the server
> > could
> > > be given a hint to cache a statement across page requests.
> > >
> > > Julian
> > >
> > > [1]
> > > https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > >
> > > > On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > > christian.beikov@xxxxxxxxx> wrote:
> > > >
> > > > Hey,
> > > >
> > > > not sure if this should be an SQL keyword. JDBC specifies various
> > > constants that can be used at statement creation time:
> > > https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > > >
> > > > Not sure though if or how these configurations are accessible for data
> > > stores or dialects, but IMO using these would be the proper way.
> > > >
> > > > Regards
> > > >
> > > > Christian
> > > >
> > > >> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > > >> Hello,
> > > >>
> > > >>
> > > >> I was thinking about adding [scrolling functionality](
> > > >>
> > >
> > https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > > )
> > > >> to elastic search adapter. Since scrolling has non-negligible effect
> > on
> > > the
> > > >> cluster it should be selectively enabled on per query basis. So,
> > likely,
> > > >> user has to explicitly set "scroll flag" somewhere.
> > > >>
> > > >> Most natural way seems in SQL. [Calcite sql grammar](
> > > >> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> > > >> (unused to my knowledge). There were also discussions about adding
> > > hints to
> > > >> Calcite.
> > > >>
> > > >> ### Examples
> > > >> ```sql
> > > >> -- special sql keyword ?
> > > >> SCROLL select * from elastic;
> > > >>
> > > >> -- assuming hints are available in calcite
> > > >> /* HINT: scroll */ select * from elastic;
> > > >> ```
> > > >>
> > > >> What people think about this use-case ? Are there better ideas ?
> > > >>
> > > >> Regards,
> > > >> Andrei.
> > > >>
> > >
> >