git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: scrolling in ElasticSearch adapter


For the sake of the discussion, I outline below a few ideas which may be of
some use.

Andrei>..elastic doesn't natively supports transactions. Scrollable results
are "serializable" only for single query not for multiple (which might
confuse users using transactions on the client side).

One option is to implement the transaction mechanism on the layer above. I
am not sure how Elastic works but possibly you could store various
information like IndexSearchers snapshots in a (new) subclass of
CalciteConnectionImpl and then pass it along using an appropriate
DataContext. This looks like a lot of work and I guess is out of the scope
of a simple adapter.

On the other hand, you could possibly rely on the semantics of
con.setAutoCommit(false)
method of JDBC (a statement is completed, and therefore committed, right
after it is executed) and indicate that setting con.setAutoCommit(true) is
not supported.

Andrei>I tend to agree, but pagination doesn't work for all ES aggregations

In principle, you could decide not to push all agregations into Elastic and
let them in the EnumerableConvention (or another convention). It would be
less performant I suppose but possibly cleaner in terms of semantics.

Andrei>How to pass resultSetType / fetchSize from PreparedStatement to
RelNodes ? What if user doesn’t use JDBC (eg. RelBuilders)

The first thing that comes in mind is to exploit the DataContext but I
don't know if this is convenient for you.




Στις Πέμ, 25 Οκτ 2018 στις 2:15 π.μ., ο/η Andrei Sereda <andrei@xxxxxxxxx>
έγραψε:

> Christian,
>
> I like TYPE_SCROLL_INSENSITIVE / fetchSize in PreparedStatement
> generally but have some reservations (questions) :
>
> How to pass resultSetType / fetchSize from PreparedStatement to RelNodes ?
> What if user doesn’t use JDBC (eg. RelBuilders) ?
> On Wed, Oct 24, 2018 at 6:28 PM Christian Beikov
> <christian.beikov@xxxxxxxxx> wrote:
> >
> > In JDBC one can configure a fetch size which would reflect the amount of
> > rows to be fetched initially, but also subsequently.
> >
> https://docs.oracle.com/javase/7/docs/api/java/sql/Statement.html#setFetchSize(int)
> >
> > According to what you are writing, ES behvior is what
> > TYPE_SCROLL_INSENSITIVE would do i.e. provide a snapshot view that isn't
> > affected by changes.
> >
> > IMO TYPE_SCROLL_SENSITIVE means that if you have rows R1, R2, R3, R4,
> > ... and view R1, R2, then R3 is deleted and you fetch the next rows, you
> > wouldn't see R3.
> >
> > According to the JDBC spec
> > (
> https://docs.oracle.com/javase/7/docs/api/java/sql/Connection.html#prepareStatement(java.lang.String,%20int,%20int)
> > ) you don't have to support all modes. Usually, user code doesn't use
> > scrolling that much, but at least forward scrolling makes sense.
> >
> > Am 24.10.2018 um 21:38 schrieb Andrei Sereda:
> > > Hi Julian,
> > >
> > > Scrolling (in elastic) does not only mean “open a cursor” but also
> iterate
> > > over consistent snapshot. From docs:
> > >
> > > The results that are returned from a scroll request reflect the state
> of
> > > the index at the time that the initial search request was made, like a
> > > snapshot in time. Subsequent changes to documents (index, update or
> delete)
> > > will only affect later search requests.
> > >
> > > So pagination (fetch / offset) can’t exactly replicate this
> functionality.
> > >
> > > The problem with scrolling (in elastic) is that it is expensive and
> can’t
> > > (shouldn’t) be enabled it by default.
> > >
> > > There is one more “issue”. Currently select * from elastic returns at
> most
> > > 10 rows (in calcite). This is consistent with elastic behaviour which
> > > limits result set to 10 documents (unless size is specified). When
> > > returning a cursor (eg. using JDBC TYPE_SCROLL_SENSITIVE
> > > <
> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#TYPE_SCROLL_SENSITIVE
> >
> > > or SQL hint) does it mean return whole elastic index ? I’m not at ease
> with
> > > returning different results based on hints or cursor settings.
> > >
> > > Andrei.
> > >
> > > On Wed, Oct 24, 2018 at 3:02 PM Julian Hyde <jhyde.apache@xxxxxxxxx>
> wrote:
> > >
> > >> It seems to me that Elasticsearch scroll means return a cursor - a
> > >> collection of rows that you iterate over, and you may not read all of
> them.
> > >> This is the default operation of JDBC.
> > >>
> > >> So, I guess we need to give the user a way to signal their intent to
> read
> > >> all rows versus only the first few. Oracle’s FIRST_ROWS and ALL_ROWS
> > >> hints[1] seem close to this. We would want the hints to be acted upon
> by
> > >> both the optimizer and the JDBC transport.
> > >>
> > >> Related is pagination. SQL has FETCH and OFFSET, which allow you to
> > >> retrieve different pieces of a large result set in separate
> statements or
> > >> (using query parameters) executions. It would be useful if the server
> could
> > >> be given a hint to cache a statement across page requests.
> > >>
> > >> Julian
> > >>
> > >> [1]
> > >>
> https://docs.oracle.com/cd/B10500_01/server.920/a96533/hintsref.htm#4924
> > >>
> > >>> On Oct 24, 2018, at 11:19 AM, Christian Beikov <
> > >> christian.beikov@xxxxxxxxx> wrote:
> > >>> Hey,
> > >>>
> > >>> not sure if this should be an SQL keyword. JDBC specifies various
> > >> constants that can be used at statement creation time:
> > >> https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html
> > >>> Not sure though if or how these configurations are accessible for
> data
> > >> stores or dialects, but IMO using these would be the proper way.
> > >>> Regards
> > >>>
> > >>> Christian
> > >>>
> > >>>> Am 24.10.2018 um 18:44 schrieb Andrei Sereda:
> > >>>> Hello,
> > >>>>
> > >>>>
> > >>>> I was thinking about adding [scrolling functionality](
> > >>>>
> > >>
> https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
> > >> )
> > >>>> to elastic search adapter. Since scrolling has non-negligible
> effect on
> > >> the
> > >>>> cluster it should be selectively enabled on per query basis. So,
> likely,
> > >>>> user has to explicitly set "scroll flag" somewhere.
> > >>>>
> > >>>> Most natural way seems in SQL. [Calcite sql grammar](
> > >>>> https://calcite.apache.org/docs/reference.html) has `SCROLL`
> keyword
> > >>>> (unused to my knowledge). There were also discussions about adding
> > >> hints to
> > >>>> Calcite.
> > >>>>
> > >>>> ### Examples
> > >>>> ```sql
> > >>>> -- special sql keyword ?
> > >>>> SCROLL select * from elastic;
> > >>>>
> > >>>> -- assuming hints are available in calcite
> > >>>> /* HINT: scroll */ select * from elastic;
> > >>>> ```
> > >>>>
> > >>>> What people think about this use-case ? Are there better ideas ?
> > >>>>
> > >>>> Regards,
> > >>>> Andrei.
> > >>>>
>