git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Exception-handling in built-in functions


Vladimir,

You’ve made your points. And I hear them.

However I get the impression that you are not open to persuasion. Which means that I am wasting my time trying to reach consensus with you. Which means that people win arguments not on merit, but based upon who is most persistent.

Here is my point. Calcite's goal is not to re-create what Oracle or PostgreSQL did ten years later. It is a platform that allows people to write their own data engine. If they want to redefine the “+” operator such that 2 + 2 returns 5, the platform should allow it.

Certainly if they want to engineer their own error-handling strategy, we should let them do it. I didn’t have the energy to find an example of a SQL engine that discards rows with divide-by-zero errors, but I believe there is one. I suspect that both Broadbase, SQLstream and Hive, three SQL engines that I have worked on that performed ETL-like tasks, all had that capability. And all ETL tools have very flexible error-handling strategies. They are not SQL-based, but Calcite is not exclusively for SQL systems.

I have been designing and building world-class data engines for 30 years. Please take me on good faith that a flexible error-handing strategy is a good idea. Don’t force me to bicker over email for hours and hours. When a long discussion leads to the rejection of a contribution, I get considerably closer to burning out.

Julian


> On Oct 17, 2018, at 11:36 AM, Vladimir Sitnikov <sitnikov.vladimir@xxxxxxxxx> wrote:
> 
> Juilian>Hey, folks. We need your input here.
> 
> Here are my thoughts:
> 1) I think the features we add should have at least some level of
> consistency
> 2) It is much safer to adopt well-known features rather than be pioneers in
> the field. I do not mean we must wait for someone else implement and try
> out a feature, however I would not rush for implementing a feature that
> no-one else explored.
> 
> CALCITE-525 has two key points:
> A) Current implementation of enumerable factors code like 0/0 to a static
> field of a generated code. It causes the generated code to fail at load
> time even before the query is executed.
> Of course that is a bug, and I'm even inclined to remove that "static
> fields"
> 
> B) Someone (Hongze? Juilan?) suggest to implement a mode to silently ignore
> the error (e.g. by ignoring the row or by returning default value).
> First of all, I don't think "ignore the row" kind of processing would do
> any good to the user since it would not be possible to predict the output.
> "ignore the row" is very tricky when join/semijoin/antijoin is there.
> 
> I'm sure OracleDB and PostgreSQL do NOT have such "features", so I think we
> should not rush for it.
> 
> C) Hongze suggests  CATCH_ERROR(1 / 0  EMPTY ON ERROR) or CATCH_ERROR(1 /
> 0)  EMPTY ON ERROR  kind of functions.
> That enables to confine the scope of the error, however I don't think it
> would be used often (does that mean one would have to wrap each
> expression?), and this "catch error" would be non-trivial to propagate to
> the downstream executors.
> On top of that, we might end up inventing full-blown try-catch-catch-catch
> syntax.
> 
> I truly see no business value in implementing B/C, however I do see the
> pain it would introduce. It would complicate Calcite maintenance. "B" could
> silently produce wrong results, and I'm sure we don't want get results out
> of thin air.
> 
> Vladimir