git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

回复:Small checkpoint data takes too much time


The checkpoint duration includes the processes of barrier alignment and state snapshot. Every task has to receive all the barriers from all the channels, then trriger to snapshot state.
I guess the barrier alignment may take long time for your case, and it is specially critical during backpressure. You can check the metric of "checkpointAlignmentTime" for confirmation.

Best,
Zhijiang
------------------------------------------------------------------
发件人:徐涛 <happydexutao@xxxxxxxxx>
发送时间:2018年10月10日(星期三) 13:13
收件人:user <user@xxxxxxxxxxxxxxxx>
主 题:Small checkpoint data takes too much time

Hi 
 I recently encounter a problem in production. I found checkpoint takes too much time, although it doesn`t affect the job execution.
 I am using FsStateBackend, writing the data to a HDFS checkpointDataUri, and asynchronousSnapshots, I print the metric data “lastCheckpointDuration” and “lastCheckpointSize”. It shows the “lastCheckpointSize” is about 80KB, but the “lastCheckpointDuration” is about 160s! Because checkpoint data is small , I think it should not take that long time. I do not know why and which condition may influent the checkpoint time. Does anyone has encounter such problem?
 Thanks a lot.

Best
Henry