git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sessionizing raw events / does Calcite support ARRAY_AGG?


As of https://issues.apache.org/jira/browse/CALCITE-2224 (to be
released shortly in 1.18) Calcite supports the WITHIN GROUP clause
(that allows you to specify the order in which values are supplied to
an aggregate function) and the COLLECT aggregate function (similar to
ARRAY_AGG but returns a nested relation rather than an array).

LISTAGG (an aggregate function that concatenates its arguments into a
string) and ARRAY_AGG are not implemented but would be straightforward
follow-ups to that work.

As I noted in CALCITE-2224, the SQL standard says that ARRAY_AGG may
optionally have an ORDER BY clause inside its parentheses, e.g.

  SELECT deptno, ARRAY_AGG(empno ORDER BY sal DESC) AS emps
  FROM emp
  GROUP BY deptno

I think WITHIN GROUP could and should be used instead, viz

  SELECT deptno, ARRAY_AGG(empno) WITHIN GROUP (ORDER BY sal DESC) AS emps
  FROM emp
  GROUP BY deptno

because that is more consistent with other aggregate functions, and
would allow us to supply ARRAY_AGG without extra parser work.

Julian


On Fri, Dec 14, 2018 at 5:58 PM Kenneth Knowles <kenn@xxxxxxxxxx> wrote:
>
> Hello!
>
> My use case is sessionizing raw events without an aggregation function.
> Approximate code that I tried out:
>
> SELECT ARRAY_AGG(ROW(...))
> FROM ...
> GROUP BY SESSION(...)
>
> (followed by UNNEST to get the raw events, tagged with session info, back
> out into a stream)
>
> I get a parser error on the paren after ARRAY_AGG, presumably because it is
> an identifier treated as a column name?
>
> So I was digging through Calcite's code and my conclusion is that there is
> no implementation of ARRAY_AGG. Is there an alternative? Is there another
> way to use Calcite's streaming extensions to do sessionization of raw
> events?
>
> Kenn