git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Clean shutdown of streaming job


Hi Neils,

Thanks for the response.

> I think your problem is that the Cassandra sink doesn't support exactly
> once guarantees when the Cassandra query isn't idempotent. If possible, the
> cleanest solution would be implementing a new or extending the existing
> Cassandra sink with the
> https://ci.apache.org/projects/flink/flink-docs-stable/api/java/index.html?org/apache/flink/streaming/api/functions/sink/TwoPhaseCommitSinkFunction.html
> interface, and setting your environment to exactly-once guarantee.

You are right. The culprit is that the Cassandra queries are not
idempotent.

I did consider implementing a custom sink that implements the two phase
commit sink function. However, working with an external system that
doesn't ahve any transaction support is non-trivial. We have to build
undo logs and roll it back from the application side in case the
transaction aborts.

That was what led me to think that pausing the Kafka stream might be the
simplest and cleanest solution here. It doesn't mandate that the sink
has to be exactly once and still provide a clean shutdown approach,
which may have broader applications.

--
Ning