git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Contribution of IN operator handling


PS Here’s why we should focus on the SQL-to-JDBC example: it is concrete. If get it right, I’m sure the other adapters will benefit. We need to be sure not to over-fit the internal representation to SQL semantics (e.g. SQL’s IN allows composite values, e.g. “(x, y) IN ((1, 2), (3, 4))”, has a particular null semantics, allows expressions and column references on both LHS and RHS, e.g. “(x, 1) IN (1, y - 5), (2, y + 1)”). If we stick to RexNode with OR, AND, =, then the semantics are clear, and other adapters can implement those semantics using their particular language constructs.

> On Nov 20, 2018, at 1:21 PM, Julian Hyde <jhyde@xxxxxxxxxx> wrote:
> 
> Yes, the other adapters need to write their own converters, specific to their target language. DB languages tend to be divergent in how they represent “column in (constant1, …, constantN)”[1][2], so this makes sense. Of course they can borrow/share code with the JDBC adapter.
> 
> Julian
> 
> [1] https://docs.mongodb.com/manual/tutorial/query-documents/#specify-conditions-using-query-operators <https://docs.mongodb.com/manual/tutorial/query-documents/#specify-conditions-using-query-operators>
> 
> [2] http://druid.io/docs/latest/querying/filters.html#in-filter <http://druid.io/docs/latest/querying/filters.html#in-filter>
> 
> 
> 
>> On Nov 20, 2018, at 12:14 PM, Andrei Sereda <andrei@xxxxxxxxx <mailto:andrei@xxxxxxxxx>> wrote:
>> 
>> Hi Julian,
>> 
>> Thanks for your reply. I have a question
>> 
>> … where t is a table based on a JDBC data source …
>> Do we agree on that goal?
>> 
>> Does that mean that each of adapters (Elastic, Geode, Mongo etc.) have to
>> implement their own version of converters (similar to RexToSqlConverter) ?
>> If it is a flat RexCall of ORs this should be doable, I guess.
>> 
>> On Tue, Nov 20, 2018 at 2:14 PM Julian Hyde <jhyde@xxxxxxxxxx <mailto:jhyde@xxxxxxxxxx>> wrote:
>> 
>>> Thanks Andrei, that’s the discussion I was thinking of.
>>> 
>>> In Mykola’s case, I think it would be useful to solve the end-to-end
>>> problem. Given a Calcite query “select … from t where x in (c1, c2, …,
>>> cN)”, where t is a table based on a JDBC data source, ci are constants and
>>> N is large, we want Calcite’s JDBC adapter to send a query similar to
>>> “select … from t where x in (c1, c2, …, cN)” to the JDBC source.
>>> 
>>> Do we agree on that goal?
>>> 
>>> If we agree on the goal, the next question is how that query should be
>>> represented in the RelNode/RexNode intermediate representation. The choice
>>> of that representation has implications: performance (e.g. whether we hit a
>>> stack-overflow exception or quadratic algorithm), quality (whether we are
>>> forging a new code path that is untested), surface area (are we going to
>>> need to write a lot of new code, for example new planner rules, in order to
>>> achieve parity with existing cases).
>>> 
>>> The approach I advocate in
>>> https://issues.apache.org/jira/browse/CALCITE-2630 <https://issues.apache.org/jira/browse/CALCITE-2630> <
>>> https://issues.apache.org/jira/browse/CALCITE-2630 <https://issues.apache.org/jira/browse/CALCITE-2630>> - representing the IN
>>> clause as a large, flat OR RexCall “x = c1 or x = c2 … or x = cN”, and
>>> having RexToSqlConverter translate that OR into an IN SqlNode - meets those
>>> criteria. (We may need to fix some bugs relating to quadratic performance
>>> or stack depth, but those are worth doing anyway.)
>>> 
>>> Are there other approaches that meet the same criteria? The original
>>> proposal - adding IN as a Rex operator - is a significant increase in
>>> surface area, so we would either lose functionality (e.g. not be able to
>>> push filters into the IN list) or find ourselves having to write a lot of
>>> new code and have to fix a lot of new bugs.
>>> 
>>> Julian
>>> 
>>> 
>>>> On Nov 20, 2018, at 10:06 AM, Andrei Sereda <andrei@xxxxxxxxx <mailto:andrei@xxxxxxxxx>> wrote:
>>>> 
>>>> Convert SqlInOperator to In-Expression :
>>>> https://issues.apache.org/jira/browse/CALCITE-2630 <https://issues.apache.org/jira/browse/CALCITE-2630>
>>>> 
>>>> Related. full table scans and subQueryThreshold.
>>>> 
>>> https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E <https://lists.apache.org/thread.html/1a25c956262633f8ef0d224ed76400761f6797c494a21796579eb4f2@%3Cdev.calcite.apache.org%3E>
>>>> 
>>>> 
>>>> 
>>>> On Tue, Nov 20, 2018 at 12:08 PM Julian Hyde <jhyde@xxxxxxxxxx> wrote:
>>>> 
>>>>> I recall contributing to some other conversations about large IN lists
>>>>> over the past few months. Before we jump into a discussion, can you
>>> locate
>>>>> those threads? Also, if there is not a JIRA case, can you please create
>>> one?
>>>>> 
>>>>> Julian
>>>>> 
>>>>>> On Nov 20, 2018, at 8:23 AM, Mykola Zerniuk <mykola.zerniuk@xxxxxxxx
>>> .INVALID>
>>>>> wrote:
>>>>>> 
>>>>>> Dear Calcite Administrators,
>>>>>> 
>>>>>> my name is Mykola, software engineer from Ukraine.
>>>>>> 
>>>>>> I had an issue with Calcite IN operator handling.
>>>>>> 
>>>>>> Here is my previous email to you:
>>>>>> 
>>>>> 
>>> https://mail-archives.apache.org/mod_mbox/calcite-dev/201810.mbox/%3CCAL4PLbiBh1HoP0w_5ScJ1Nnxq%2BNYGP2LO2usxg_17Gs1mYgttA%40mail.gmail.com%3E
>>>>>> 
>>>>>> It is really important to us to have an option to left IN operator "as
>>>>>> is" and do not do any conversions. I implemented it a while ago at my
>>>>>> local, and it successfully works in our project.
>>>>>> 
>>>>>> Our team would be happy to have your review and contribute it to
>>> Calcite.
>>>>>> 
>>>>>> If you have no objections may i create a work item in Jira? I am
>>>>>> following these steps:
>>>>>> https://calcite.apache.org/develop/#contributing
>>>>>> 
>>>>>> Thanks a lot,
>>>>>> Mykola
>