git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Approximate query processing in Calcite


I also noticed that they have a Veeline module [1], a fork of SQLline [2] that I maintain.

No complains about that — copy-paste is re-use, and re-use is good! — but if they want to contribute their changes back I’d be glad to have them.

Julian

[1] https://github.com/mozafari/verdictdb/tree/master/veeline <https://github.com/mozafari/verdictdb/tree/master/veeline>

[2] https://github.com/julianhyde/sqlline <https://github.com/julianhyde/sqlline> 

> On May 7, 2018, at 11:19 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> 
> You are correct that there are a lot of pieces the systems could probably
> share. In fact, mentioning some of the other systems using Calcite's parser
> drew immediate interest so I think that's something they're exploring. It
> seems as though they may also want to exploring using Calcite's relational
> algebra.
> 
> As far as selectively enabling AQP, I'm guessing the current answer would
> be that if you want exact answers, connect directly to the underlying DB,
> otherwise, expect VerdictDB to give an approximate answer. I can see why
> this might not be a great solution in some deployment scenarios though.
> 
> --
> Michael Mior
> mmior@xxxxxxxxxxxx
> 
> 
> Le lun. 7 mai 2018 à 13:57, Julian Hyde <jhyde@xxxxxxxxxx> a écrit :
> 
>> In many ways VerdictDB has a similar architecture to Calcite - a query
>> mediation layer that understands SQL can sends modified SQL to the back-end.
>> 
>> I think of approximate query processing as a form of materialized view
>> rewrite. In order to answer the query you obviously have to read some data,
>> but if you read the original data the I/o cost will be too high. Therefore
>> you have to read some kind of summary / synopsis of the data. That summary
>> is a kind of materialized view.
>> 
>> As such, I expect that VerdictDB will need to build similar pieces to what
>> we have already built (parser, JDBC driver, relational algebra,
>> materialized view rewrites, SQL dialect support). They’re welcome to share.
>> 
>> One gripe I’ve had with several approximate query processing systems is
>> the inability to control whether to use approximation. For some queries I
>> can use approximation, for other queries I can use approximation for some
>> measures but not others. I wish that approximate query processing systems
>> gave exact results by default, but allowed users to add “approximate”
>> clauses into queries to say where they accept approximations.
>> 
>> Julian
>> 
>> 
>>> On May 7, 2018, at 10:29 AM, Xiening Dai <xndai.git@xxxxxxxx> wrote:
>>> 
>>> Hi Michael,
>>> 
>>> AQP is valuable in our business scenarios. Often time our data scientist
>> would issue exploratory queries to get a basic sense of the data (means,
>> aggregation on certain groupings, etc). But from my understanding, Verdict
>> DB is independent to the database system and the query planner/optimizer. I
>> wonder what you want to achieve specifically through Calcite integration?
>>> 
>>> 
>>>> On May 7, 2018, at 10:05 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
>>>> 
>>>> Edmon (and others),
>>>> 
>>>> I'd be curious to hear more about your specific use cases if you're
>> able to
>>>> share. Especially those who have companies which may benefit from using
>> AQP
>>>> with Calcite to lower costs.
>>>> 
>>>> --
>>>> Michael Mior
>>>> mmior@xxxxxxxxxxxx
>>>> 
>>>> 
>>>> Le jeu. 3 mai 2018 à 18:58, Edmon Begoli <ebegoli@xxxxxxxxx> a écrit :
>>>> 
>>>>> I am excited that you are considering taking Calcite in this direction.
>>>>> 
>>>>> Approximate querying and probabilistic databases are of great interest
>> to
>>>>> me, and I might be able to provide some applied research scenarios.
>>>>> 
>>>>> One domain that comes to mind where we had some use cases is a sensor
>> data
>>>>> analysis.
>>>>> 
>>>>> Thank you,
>>>>> Edmon
>>>>> 
>>>>> On Thu, May 3, 2018 at 6:54 PM, Michael Mior <mmior@xxxxxxxxxxxx>
>> wrote:
>>>>> 
>>>>>> Hi all,
>>>>>> 
>>>>>> I recently had a chat with the VerdictDB (http://verdictdb.org/) team
>>>>>> about
>>>>>> possible integration with Calcite. VerdictDB sits between an
>> application
>>>>>> and a database to enable the approximation of query results which are
>>>>>> expected to be highly accurate while consuming significantly fewer
>>>>>> resources on the backend.
>>>>>> 
>>>>>> I'm curious to talk to anyone who might have a use case for this.
>>>>>> Particularly those using Calcite to power analytics systems that can
>>>>>> tolerate approximate results. We'll likely be looking at putting
>>>>> together a
>>>>>> proof of concept in the next few weeks if there's any interest. Let me
>>>>>> know!
>>>>>> 
>>>>>> --
>>>>>> Michael Mior
>>>>>> mmior@xxxxxxxxxxxx
>>>>>> 
>>>>> 
>>> 
>> 
>> 



( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-calcite-development/msg03431.html on line 191
Call Stack
#TimeMemoryFunctionLocation
10.0013368760{main}( ).../msg03431.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-calcite-development/msg03431.html on line 191
Call Stack
#TimeMemoryFunctionLocation
10.0013368760{main}( ).../msg03431.html:0