git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Approximate query processing in Calcite


That's I question I can raise with the VerdictDB folks. However, I doubt it
makes sense for them to invest the time to make the change without a clear
tangible benefit for them. They already have a working system and it's not
clear what they would gain by swapping out so much of their internals for
Calcite. That said, there does seem to be some interest in using Calcite's
parser and possible relational algebra.

--
Michael Mior
mmior@xxxxxxxxxxxx


Le lun. 7 mai 2018 à 15:34, Edmon Begoli <ebegoli@xxxxxxxxxx> a écrit :

> A perhaps idealistic question, but would it be possible to do a little bit
> of engineering and architectural swap, where we incorporate VerdictDB’s
> unique parsing, optimization and other components, and then have them use
> Calcite?
>
> Again, this is probably an easy thing to say, and much harder to do, but I
> ask nevertheless — what would it take to do it?
>
> On Mon, May 7, 2018 at 15:16 Julian Hyde <jhyde@xxxxxxxxxx> wrote:
>
> > I also noticed that they have a Veeline module [1], a fork of SQLline [2]
> > that I maintain.
> >
> > No complains about that — copy-paste is re-use, and re-use is good! — but
> > if they want to contribute their changes back I’d be glad to have them.
> >
> > Julian
> >
> > [1] https://github.com/mozafari/verdictdb/tree/master/veeline <
> > https://github.com/mozafari/verdictdb/tree/master/veeline>
> >
> > [2] https://github.com/julianhyde/sqlline <
> > https://github.com/julianhyde/sqlline>
> >
> > > On May 7, 2018, at 11:19 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> > >
> > > You are correct that there are a lot of pieces the systems could
> probably
> > > share. In fact, mentioning some of the other systems using Calcite's
> > parser
> > > drew immediate interest so I think that's something they're exploring.
> It
> > > seems as though they may also want to exploring using Calcite's
> > relational
> > > algebra.
> > >
> > > As far as selectively enabling AQP, I'm guessing the current answer
> would
> > > be that if you want exact answers, connect directly to the underlying
> DB,
> > > otherwise, expect VerdictDB to give an approximate answer. I can see
> why
> > > this might not be a great solution in some deployment scenarios though.
> > >
> > > --
> > > Michael Mior
> > > mmior@xxxxxxxxxxxx
> > >
> > >
> > > Le lun. 7 mai 2018 à 13:57, Julian Hyde <jhyde@xxxxxxxxxx> a écrit :
> > >
> > >> In many ways VerdictDB has a similar architecture to Calcite - a query
> > >> mediation layer that understands SQL can sends modified SQL to the
> > back-end.
> > >>
> > >> I think of approximate query processing as a form of materialized view
> > >> rewrite. In order to answer the query you obviously have to read some
> > data,
> > >> but if you read the original data the I/o cost will be too high.
> > Therefore
> > >> you have to read some kind of summary / synopsis of the data. That
> > summary
> > >> is a kind of materialized view.
> > >>
> > >> As such, I expect that VerdictDB will need to build similar pieces to
> > what
> > >> we have already built (parser, JDBC driver, relational algebra,
> > >> materialized view rewrites, SQL dialect support). They’re welcome to
> > share.
> > >>
> > >> One gripe I’ve had with several approximate query processing systems
> is
> > >> the inability to control whether to use approximation. For some
> queries
> > I
> > >> can use approximation, for other queries I can use approximation for
> > some
> > >> measures but not others. I wish that approximate query processing
> > systems
> > >> gave exact results by default, but allowed users to add “approximate”
> > >> clauses into queries to say where they accept approximations.
> > >>
> > >> Julian
> > >>
> > >>
> > >>> On May 7, 2018, at 10:29 AM, Xiening Dai <xndai.git@xxxxxxxx> wrote:
> > >>>
> > >>> Hi Michael,
> > >>>
> > >>> AQP is valuable in our business scenarios. Often time our data
> > scientist
> > >> would issue exploratory queries to get a basic sense of the data
> (means,
> > >> aggregation on certain groupings, etc). But from my understanding,
> > Verdict
> > >> DB is independent to the database system and the query
> > planner/optimizer. I
> > >> wonder what you want to achieve specifically through Calcite
> > integration?
> > >>>
> > >>>
> > >>>> On May 7, 2018, at 10:05 AM, Michael Mior <mmior@xxxxxxxxxxxx>
> wrote:
> > >>>>
> > >>>> Edmon (and others),
> > >>>>
> > >>>> I'd be curious to hear more about your specific use cases if you're
> > >> able to
> > >>>> share. Especially those who have companies which may benefit from
> > using
> > >> AQP
> > >>>> with Calcite to lower costs.
> > >>>>
> > >>>> --
> > >>>> Michael Mior
> > >>>> mmior@xxxxxxxxxxxx
> > >>>>
> > >>>>
> > >>>> Le jeu. 3 mai 2018 à 18:58, Edmon Begoli <ebegoli@xxxxxxxxx> a
> écrit
> > :
> > >>>>
> > >>>>> I am excited that you are considering taking Calcite in this
> > direction.
> > >>>>>
> > >>>>> Approximate querying and probabilistic databases are of great
> > interest
> > >> to
> > >>>>> me, and I might be able to provide some applied research scenarios.
> > >>>>>
> > >>>>> One domain that comes to mind where we had some use cases is a
> sensor
> > >> data
> > >>>>> analysis.
> > >>>>>
> > >>>>> Thank you,
> > >>>>> Edmon
> > >>>>>
> > >>>>> On Thu, May 3, 2018 at 6:54 PM, Michael Mior <mmior@xxxxxxxxxxxx>
> > >> wrote:
> > >>>>>
> > >>>>>> Hi all,
> > >>>>>>
> > >>>>>> I recently had a chat with the VerdictDB (http://verdictdb.org/)
> > team
> > >>>>>> about
> > >>>>>> possible integration with Calcite. VerdictDB sits between an
> > >> application
> > >>>>>> and a database to enable the approximation of query results which
> > are
> > >>>>>> expected to be highly accurate while consuming significantly fewer
> > >>>>>> resources on the backend.
> > >>>>>>
> > >>>>>> I'm curious to talk to anyone who might have a use case for this.
> > >>>>>> Particularly those using Calcite to power analytics systems that
> can
> > >>>>>> tolerate approximate results. We'll likely be looking at putting
> > >>>>> together a
> > >>>>>> proof of concept in the next few weeks if there's any interest.
> Let
> > me
> > >>>>>> know!
> > >>>>>>
> > >>>>>> --
> > >>>>>> Michael Mior
> > >>>>>> mmior@xxxxxxxxxxxx
> > >>>>>>
> > >>>>>
> > >>>
> > >>
> > >>
> >
> >
>