[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Unified Core API for Streaming and Batch

Hi Haibo,

Thank you for this great proposal!

Flink is a unified computing engine. It has been unified at the TableAPI
and SQLAPI levels (not yet complete). It's greate If we can unify the
DataSet API and DataStream API.

I also want to convert to StreamTransformation in the SQL and Table API,
because batch(SQL/TableAPI) and stream(SQL/TableAPI) can use the Calcite
optimizer for query optimization, so we can ignore the optimization of
DataSet(for batch).
But for users who use DataSet purely, how to solve the optimization problem?


孙海波 <sunhaibotb@xxxxxxx> 于2018年12月3日周一 上午10:52写道:

> Hi all,
> This post proposes unified core API for Streaming and Batch.
> Currently DataStream and DataSet adopt separated compilation processes,
> execution tasks
> and basic programming models in the runtime layer, which complicates the
> system implementation.
> We think that batch jobs can be processed in the same way as streaming
> jobs, thus we can unify
> the execution stack of DataSet into that of DataStream.  After the
> unification the DataSet API will
> also be built on top of StreamTransformation, and its basic programming
> model will be changed
> from "UDF on Driver" to "UDF on StreamOperator". Although the DataSet
> operators will need to
> implement the interface StreamOperator instead after the unification, user
> jobs do not need to change
> since DataSet uses the same UDF interfaces as DataStream.
> The unification has at least three benefits:
> 1. The system will be greatly simplified with the same execution stack for
> both streaming and batch jobs.
> 2. It is no longer necessary to implement two sets of Driver(s) (operator
> strategies) for batch, namely chained and non-chained.
> 3. The unified programming model enables streaming and batch jobs to share
> the same operator implementation.
> The following is the design draft. Any feedback is highly appreciated. .
> Best,
> Haibo