git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scrolling in ElasticSearch adapter


Christian,

I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
generally but have some reservations (questions) :

How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
What if user doesn’t use JDBC (eg. RelBuilders) ?
On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
<christian.beikov@xxxxxxxxx> wrote:
>
> In JDBC one can configure a fetch size which would reflect the amount of
> rows to be fetched initially, but also subsequently.
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
>
> According to what you are writing, ES behvior is what
> TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
> affected by changes.
>
> IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
> ... and view R1, R2, then R3 is deleted and you fetch the next rows, you
> wouldn't see R3.
>
> According to the JDBC spec
> (https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> ) you don't have to support all modes. Usually, user code doesn't use
> scrolling that much, but at least forward scrolling makes sense.
>
> Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > Hi Julian,
> >
> > Scrolling (in elastic) does not only mean “open a cursor” but also iterate
> > over consistent snapshot. From docs:
> >
> > The results that are returned from a scroll request reflect the state of
> > the index at the time that the initial search request was made, like a
> > snapshot in time. Subsequent changes to documents (index, update or delete)
> > will only affect later search requests.
> >
> > So pagination (fetch / offset) can’t exactly replicate this functionality.
> >
> > The problem with scrolling (in elastic) is that it is expensive and can’t
> > (shouldn’t) be enabled it by default.
> >
> > There is one more “issue”. Currently select * from elastic returns at most
> > 10 rows (in calcite). This is consistent with elastic behaviour which
> > limits result set to 10 documents (unless size is specified). When
> > returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > <https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE>
> > or SQL hint) does it mean return whole elastic index ? I’m not at ease with
> > returning different results based on hints or cursor settings.
> >
> > Andrei.
> >
> > On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jhyde.apache@xxxxxxxxx> wrote:
> >
> >> It seems to me that Elasticsearch scroll means return a cursor - a
> >> collection of rows that you iterate over, and you may not read all of them.
> >> This is the default operation of JDBC.
> >>
> >> So, I guess we need to give the user a way to signal their intent to read
> >> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> >> hints[1] seem close to this. We would want the hints to be acted upon by
> >> both the optimizer and the JDBC transport.
> >>
> >> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> >> retrieve different pieces of a large result set in separate statements or
> >> (using query parameters) executions. It would be useful if the server could
> >> be given a hint to cache a statement across page requests.
> >>
> >> Julian
> >>
> >> [1]
> >> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> >>
> >>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> >> christian.beikov@xxxxxxxxx> wrote:
> >>> Hey,
> >>>
> >>> not sure if this should be an SQL keyword. JDBC specifies various
> >> constants that can be used at statement creation time:
> >> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> >>> Not sure though if or how these configurations are accessible for data
> >> stores or dialects, but IMO using these would be the proper way.
> >>> Regards
> >>>
> >>> Christian
> >>>
> >>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> >>>> Hello,
> >>>>
> >>>>
> >>>> I was thinking about adding [scrolling functionality](
> >>>>
> >> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> >> )
> >>>> to elastic search adapter. Since scrolling has non-negligible effect on
> >> the
> >>>> cluster it should be selectively enabled on per query basis. So, likely,
> >>>> user has to explicitly set "scroll flag" somewhere.
> >>>>
> >>>> Most natural way seems in SQL. [Calcite sql grammar](
> >>>> https://calcite.apache.org/docs/reference.html) has `SCROLL` keyword
> >>>> (unused to my knowledge). There were also discussions about adding
> >> hints to
> >>>> Calcite.
> >>>>
> >>>> ### Examples
> >>>> ```sql
> >>>> -- special sql keyword ?
> >>>> SCROLL select * from elastic;
> >>>>
> >>>> -- assuming hints are available in calcite
> >>>> /* HINT: scroll */ select * from elastic;
> >>>> ```
> >>>>
> >>>> What people think about this use-case ? Are there better ideas ?
> >>>>
> >>>> Regards,
> >>>> Andrei.
> >>>>