git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Review Request (CALCITE-2689). ElasticSearch Adapter. Grouping on date / number fields.


Greetings ,
We have discovered an issue with ES aggregations when grouping on
non-textual fields (date, long). Currently the following sql fails because
for missing value
<https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-terms-aggregation.html#_missing_value_13>
we inject __MISSING__ sentinel which is not date / number parseable (it
can’t be null either) :

select max(amount), date from orders group by date -- special ES type

The solution is to make sentinel type specific :

   1. Integer.MIN_VALUE for integers
   2. 9999-12-31 for dates etc.

For low cardinality types like boolean, byte, short I’m not sure what to do
since there is high probability of collision between missing field and
actual value (eg. what value to choose for missing boolean?) :

select max(amount), isActive from orders group by isActive -- boolean type

Let me know if you solved this problem differently before. Composite
aggregations
<https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-composite-aggregation.html>
(available since 6.1) should help in future.

PR: https://github.com/apache/calcite/pull/946
JIRA: https://issues.apache.org/jira/browse/CALCITE-2689

Many Thanks in Advance,
Andrei.