[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] FLIP-6 Problems

Hi Renjie,

1) you're right that the Flink session mode does not give you proper job
isolation. It is the same as with Flink 1.4 session mode. If this is a
strong requirement for you, then I recommend using the per job mode.

2) At the moment it is also not possible to define per job resource
requirements when using the session mode. This is a feature which the
community has started implementing but it is not yet fully done. I assume
that the community will continue working on it. At the moment, the solution
would be to use the per job mode to not waste unnecessary resources.

3) I think the assigned ResourceID for a TaskManager is shown in the web UI
and when querying the "/taskmanagers" REST endpoint. The resource id is
derived from the Mesos task id. Would that help to identify which TM is
running on which Mesos task?


On Tue, Jun 5, 2018 at 5:13 AM Renjie Liu <liurenjie2008@xxxxxxxxx> wrote:

> ---------- Forwarded message ---------
> From: Renjie Liu <liurenjie2008@xxxxxxxxx>
> Date: Tue, Jun 5, 2018 at 10:43 AM
> Subject: [DISCUSS] FLIP-6 Problems
> To: user <user@xxxxxxxxxxxxxxxx>
> Hi:
> We've deployed flink 1.5.0 and tested the new cluster manager, it's really
> great for flink to be elastic. However we've also found some problems that
> blocks us from deploying it to production environment.
> 1. Task manager isolation. Currently flink allows different jobs to execute
> on same task managers, this is unacceptable in production environment since
> a faulty written job may kill task managers and affect other jobs.
> 2. Per job resource configuration. Currently flink session cluster can only
> allocate same size and configuration task managers. This may waste a lot of
> resources if we have a lot of jobs with different resource requirement.
> 3. Task manager's name is meanless.  This is a problem since we can't
> monitor status of container in mesos environment.
> One solution to the above problems is to use per job cluster, but a
> centralized cluster manager can help to manage flink deployment and jobs
> better.
> How you guys think about those? If the community agrees with us, we would
> like to propose design and implementation to enhance the flink cluster
> manager.
> --
> Liu, Renjie
> Software Engineer, MVAD
> --
> Liu, Renjie
> Software Engineer, MVAD