[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Relational algebra and signal processing

Perhaps you've thought of this already, but it sounds like streaming
relational algebra could be a good fit here.
Michael Mior

Le dim. 16 déc. 2018 à 18:39, Julian Feinauer <j.feinauer@xxxxxxxxxxxxxxxxx>
a écrit :

> Hi Calcite-devs,
> I just had a very interesting mail exchange with Julian (Hyde) on the
> incubator list [1]. It was about our project CRUNCH (which is mostly about
> time series analyses and signal processing) and its relation to relational
> algebra and I wanted to bring the discussion to this list to continue here.
> We already had some discussion about how time series would work in calcite
> [2] and it’s closely related to MATCH_RECOGNIZE.
> But, I have a more general question in mind, to ask the experts here on
> the list.
> I ask myself if we can see the signal processing and analysis tasks as
> proper application of relational algebra.
> Disclaimer, I’m mathematician, so I know the formals of (relational)
> algebra pretty well but I’m lacking a lot of experience and knowledge in
> the database theory. Most of my knowledge there comes from Calcites source
> code and the book from Garcia-Molina and Ullman).
> So if we take, for example, a stream of signals from a sensor, then we can
> of course do filtering or smoothing on it and this can be seen as a mapping
> between the input relation and the output relation. But as we usually need
> more than just one tuple at a time we lose many of the advantages of the
> relational theory. And then, if we analyze the signal, we can again model
> it as a mapping between relations, but the input relation is a “time
> series” and the output relation consists of “events”, so these are in some
> way different dimensions. In this situation it becomes mostly obvious where
> the main differences between time series and relational algebra are. Think
> of something simple, an event should be registered, whenever the signal
> switches from FALSE to TRUE (so not for every TRUE). This could also be
> modelled with MATCH_RECOGNIZE pretty easily. But, for me it feels
> “unnatural” because we cannot use any indices (we don’t care about the
> ratio of TRUE and FALSE in the DB, except for probably some very rough
> outer bounds). And we are lacking the “right” information for the optimizer
> like estimations on the number of analysis results.
> It gets even more complicated when moving to continuous valued signals
> (INT, DOUBLE, …), e.g., temperature readings or something.
> If we want to analyze the number of times where we have a temperature
> change of more than 5 degrees in under 4 hours, this should also be doable
> with MATCH_RECOGNIZE but again, there is no index to help us and we have no
> information for the optimizer, so it feels very “black box” for the
> relational algebra.
> I’m not sure if you get my point, but for me, the elegance of relational
> algebra was always this optimization stuff, which comes from declarative
> and ends in an “optimal” physical plan. And I do not see how we can use
> much of this for the examples given above.
> Perhaps, one solution would be to do the same as for spatial queries (or
> the JSON / JSONB support in postgres, [3]) to add specialized indices,
> statistics and optimizer rules. Then, this would make it more “relational
> algebra”-esque in the sense that there really is a possibility to apply
> transformations to a given query.
> What do you think? Do I see things to complicated or am I missing
> something?
> Julian
> [1]
> [2]
> [3]