Re: [DISCUSS] Unified Core API for Streaming and Batch
Thank you for this great proposal！
Flink is a unified computing engine. It has been unified at the TableAPI
and SQLAPI levels (not yet complete). It's greate If we can unify the
DataSet API and DataStream API.
I also want to convert to StreamTransformation in the SQL and Table API,
because batch（SQL/TableAPI) and stream(SQL/TableAPI) can use the Calcite
optimizer for query optimization, so we can ignore the optimization of
But for users who use DataSet purely, how to solve the optimization problem?
孙海波 <sunhaibotb@xxxxxxx> 于2018年12月3日周一 上午10:52写道：
> Hi all,
> This post proposes unified core API for Streaming and Batch.
> Currently DataStream and DataSet adopt separated compilation processes,
> execution tasks
> and basic programming models in the runtime layer, which complicates the
> system implementation.
> We think that batch jobs can be processed in the same way as streaming
> jobs, thus we can unify
> the execution stack of DataSet into that of DataStream. After the
> unification the DataSet API will
> also be built on top of StreamTransformation, and its basic programming
> model will be changed
> from "UDF on Driver" to "UDF on StreamOperator". Although the DataSet
> operators will need to
> implement the interface StreamOperator instead after the unification, user
> jobs do not need to change
> since DataSet uses the same UDF interfaces as DataStream.
> The unification has at least three benefits:
> 1. The system will be greatly simplified with the same execution stack for
> both streaming and batch jobs.
> 2. It is no longer necessary to implement two sets of Driver(s) (operator
> strategies) for batch, namely chained and non-chained.
> 3. The unified programming model enables streaming and batch jobs to share
> the same operator implementation.
> The following is the design draft. Any feedback is highly appreciated. .