[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Flink 1.6 Job fails with IllegalStateException: Buffer pool is destroyed.

Hi all,
I am encountering a weird problem when running flink 1.6 in yarn per-job clusters.
The job fails in about half an hour after it starts. Related logs is attached as an imange.

This piece of log comes from one of the taskmanagers. There are not any other related log lines.
No ERROR-level logs. The job just runs for tens of minutes without printing any logs
and suddenly throws this exception.

It is reproducable in my production environment, but not in my test environment.
The 'Buffer pool is destroed' exception is always thrown while emitting latency marker.

Attachment: image.png
Description: Binary data