回复:Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed.


I think the problem in the attched image is not the root cause of your job failure. It must exist other task or TaskManager failures, then all the related tasks will be cancelled by job manager, and the problem in attched image is just caused by task cancelled.

You can review the log of job manager to check whether there are any failures to cause failing the whole job.
FYI, the task manager may be killed by yarn because of memory exceed. You mentioned the job fails in half an hour after starts, so I guess it exits the possibility that the task manager is killed by yarn.

Hi all,
I am encountering a weird problem when running flink 1.6 in yarn per-job clusters.
The job fails in about half an hour after it starts. Related logs is attached as an imange.

This piece of log comes from one of the taskmanagers. There are not any other related log lines.
No ERROR-level logs. The job just runs for tens of minutes without printing any logs
and suddenly throws this exception.

It is reproducable in my production environment, but not in my test environment.
The 'Buffer pool is destroed' exception is always thrown while emitting latency marker.