git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [DISCUSS] Flink Cluster Overview Dashboard Improvement Proposal


Hi Fabian,

thanks for starting this discussion. I agree with you that Flink's web
dashboard lacks a bit of general cluster overview information on the front
page. Your mock looks really promising to me since it shows some basic
metrics and cluster information at a glance. Apart from the the source
input and sink output metrics, all other required information should be
available to display it in the dashboard. Thus, your proposal should only
affect flink-runtime-web which should make it easier to realize.

I'm in favour of adding this feature to Flink's dashboard to make it
available to the whole community.

Cheers,
Till

On Tue, Oct 9, 2018 at 12:54 PM Fabian Wollert <fabian@xxxxxxxxxx> wrote:

> argh, i think the screenshot is missing (at least nabble is not showing
> anything). here is a link to the mockup:
>
>
> https://drive.google.com/file/d/1p3wVP028_AFFLZ6fjPb41yAI8zUhgDTO/view?usp=sharing
>
> Cheers
>
> --
>
>
> *Fabian WollertZalando SE*
>
> E-Mail: fabian@xxxxxxxxxx
>
>
> Am Di., 9. Okt. 2018 um 12:46 Uhr schrieb Fabian Wollert <
> fabian@xxxxxxxxxx>:
>
>> Hi everyone,
>>
>> disclaimer: i read the contribution guide about improvement requests
>> (i.e. i should actually just start a jira ticket) but i thought it would
>> make sense to run this first through the mailing list here. after
>> collecting some input i would then create the jira ticket.
>>
>> When accessing the Flink Web Dashboard (which is basically what i do
>> almost every day to check some status of a job or so), I recently felt that
>> the actual information given in the top portion of the start page is highly
>> improvable. I created a first mock by moving html elements around and
>> wanted to share this one now:
>>
>> [image: image.png]
>>
>> With the exception of the metrics (see below) none of this information
>> should be new, but rather re-organized to speed up investigation and
>> monitoring:
>>
>>    - complete overview on the cluster status and health, without
>>    clicking through a lot of pages.
>>    - Active and stand-by Job Managers. Also their health is depicted as
>>       a color (as a first suggestion: last heartbeat is inside heartbeat.timeout)
>>       - Current registered Task Managers
>>          - the little bar on the side indicates task slot usage. i did
>>          not color it since a fully utilised task manager is not necessarily
>>          something bad.
>>          - the color indicates the health of the task manager (as a
>>          first suggestion: last heartbeat is inside heartbeat.timeout)
>>       - overview on some cluster metrics
>>
>> Some points to notice:
>>
>>    - All data you see on the screenshot is mock, no number relates to
>>    another number at all. but colors should relate to the numbers already
>>    which they indicate.
>>    - All of this could also be done with other monitoring solutions
>>    someone might have in his company, by reading out JMX metrics and then
>>    plotting those in his monitoring solution (e.g. grafana). But this out of
>>    the box solution would save everyone from doing it on their own and they
>>    could trust the metrics shown here.
>>    - Some of the metrics can only be done with FLINK-7286
>>    <https://issues.apache.org/jira/browse/FLINK-7286> being done. So i
>>    would split the implementation of this into two parts (cluster overview and
>>    metrics) and do them separately.
>>    - This first mock up is targeted to what we here at Zalando would
>>    like to see first glance, so it fits our use case very well. We mostly use
>>    long-running session clusters.
>>    - I'm more a Backend Guy with some Frontend expertise (but mostly in
>>    React, no angular1 (Flink Web Dashboard is built with this currently)
>>    experience) and not at all a designer.
>>
>> What do you think? I would be glad to have some feedback on this,
>> especially if this makes sense in the broad community. I would no matter
>> what implement this somehow, if not in the Flink Master branch, then as a
>> OS project which anyone can deploy next to their flink clusters. But i
>> first wanted to run it through here to see if this sparks any interest.
>>
>> Please also let me know if you see difficulties implementing this
>> already, maybe i have overseen something.
>>
>> Can't wait for your input.
>>
>> Cheers
>>
>> --
>>
>>
>> *Fabian WollertZalando SE*
>>
>> E-Mail: fabian@xxxxxxxxxx
>>
>