git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Approximate query processing in Calcite


Hello Michael and Xiening,

Due to NDA, I cannot reveal in detail, but I recall one of my previous projects where the sponsor wanted to do a sort of approximate query over RDBMs in geo-distributed environments. 

Roughly speaking, they tried to find if the entity of interest can be connected to black listed entities to detect fraudulent cases.

For instance, they wanted to find approximated # of rows when they repeat a fixed number of join operations (~9 to 10) starting from a specific row. It is equivalent to find # of reachable vertices from a source vertex in a graph within k-hop. They wanted to perform a few other variants of queries in this direction. It was because they were simply okay if the starting row can be reachable from one of block listed nodes, instead of returning exact nodes to be reached.

I was wondering if this sounds an interesting application.

Thank you,
Seung-Hwan


> On May 10, 2018, at 9:43 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
> 
> Xiening,
> 
> Sorry for the delayed response. I think one of the interesting areas where
> there is a potential benefit is when dealing with datastores that don't
> directly support the queries you're trying to ask. For example, the
> Cassandra adapter allows joins across CQL tables which is not possible in
> Cassandra. That said, the performance of these joins will generally be
> terrible. However, it's possible that with AQP we would be able to get
> reasonable answers to queries that are otherwise completely impractical.
> 
> In any case, I'd love to hear your findings on speaking with customers.
> Thanks!
> 
> --
> Michael Mior
> mmior@xxxxxxxxxxxx
> 
> 
> Le lun. 7 mai 2018 à 17:06, Xiening Dai <xndai.git@xxxxxxxx> a écrit :
> 
>> However, we are interested in exploring ways that a deeper integration
>> might be beneficial especially in the case of federated query processing
>> across multiple database backends.
>> 
>> 
>> Can you elaborate a bit more on this? I can understand that there are
>> common functionalities can be shared and make sense to share (parsing,
>> relational algebras, etc). But I am more curious about what new scenarios
>> and/or performance benefits can be achieved through the integration.
>> 
>> I personally work on the infrastructure side, and am not the direct user
>> of data system. But some of our customers do express interests in
>> approximate query support. I’d like to talk to them and do some more
>> survey. I would be happy to share with you my findings after that. Thanks.
>> 
>> 
>> On May 7, 2018, at 11:15 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
>> 
>> Xiening,
>> 
>> You are correct that VerdictDB is currently completely DBMS independent.
>> However, we are interested in exploring ways that a deeper integration
>> might be beneficial especially in the case of federated query processing
>> across multiple database backends. The first step would be to simply allow
>> Calcite and VerdictDB to work together. That is potentially highly useful
>> in itself since then it should be possible to perform AQP over any
>> Calcite-supported backend.
>> 
>> If you'd be willing to discuss potential use cases further, I'd love to try
>> to schedule a call with you.
>> 
>> --
>> Michael Mior
>> mmior@xxxxxxxxxxxx
>> 
>> 
>> Le lun. 7 mai 2018 à 13:30, Xiening Dai <xndai.git@xxxxxxxx> a écrit :
>> 
>> Hi Michael,
>> 
>> AQP is valuable in our business scenarios. Often time our data scientist
>> would issue exploratory queries to get a basic sense of the data (means,
>> aggregation on certain groupings, etc). But from my understanding, Verdict
>> DB is independent to the database system and the query planner/optimizer. I
>> wonder what you want to achieve specifically through Calcite integration?
>> 
>> 
>> On May 7, 2018, at 10:05 AM, Michael Mior <mmior@xxxxxxxxxxxx> wrote:
>> 
>> Edmon (and others),
>> 
>> I'd be curious to hear more about your specific use cases if you're able
>> to
>> share. Especially those who have companies which may benefit from using
>> AQP
>> with Calcite to lower costs.
>> 
>> --
>> Michael Mior
>> mmior@xxxxxxxxxxxx
>> 
>> 
>> Le jeu. 3 mai 2018 à 18:58, Edmon Begoli <ebegoli@xxxxxxxxx> a écrit :
>> 
>> I am excited that you are considering taking Calcite in this direction.
>> 
>> Approximate querying and probabilistic databases are of great interest
>> to
>> me, and I might be able to provide some applied research scenarios.
>> 
>> One domain that comes to mind where we had some use cases is a sensor
>> data
>> analysis.
>> 
>> Thank you,
>> Edmon
>> 
>> On Thu, May 3, 2018 at 6:54 PM, Michael Mior <mmior@xxxxxxxxxxxx>
>> wrote:
>> 
>> Hi all,
>> 
>> I recently had a chat with the VerdictDB (http://verdictdb.org/) team
>> about
>> possible integration with Calcite. VerdictDB sits between an
>> application
>> and a database to enable the approximation of query results which are
>> expected to be highly accurate while consuming significantly fewer
>> resources on the backend.
>> 
>> I'm curious to talk to anyone who might have a use case for this.
>> Particularly those using Calcite to power analytics systems that can
>> tolerate approximate results. We'll likely be looking at putting
>> together a
>> proof of concept in the next few weeks if there's any interest. Let me
>> know!
>> 
>> --
>> Michael Mior
>> mmior@xxxxxxxxxxxx
>> 
>> 
>> 
>> 
>> 
>> 



( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-calcite-development/msg03450.html on line 211
Call Stack
#TimeMemoryFunctionLocation
10.0010368760{main}( ).../msg03450.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-calcite-development/msg03450.html on line 211
Call Stack
#TimeMemoryFunctionLocation
10.0010368760{main}( ).../msg03450.html:0