[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Deadlock in SafetyNetCloseableRegistry?


   I starts a flink program and it runs on yarn. At first it doesn’t aquire enough resources so this is thrown.

“org.apache.flink.runtime.jobmanager.scheduler.NoResourceAvailableException: Could not allocate all requires slots within timeout of 300000 ms. Slots required: 16, slots allocated: 7”.

  Then the jobmanager automatically restarts but fail to trigger checkpoint anymore because “expired before completing”. All the taskmanagers are blocked, and I find there seems to be a dead lock in SafetyNetCloseableRegistry, and maybe that’s why the whole taskmanager is blocked. Here is the taskmanager’s stack:


  Best, Jiayi Liao

Attachment: out
Description: Binary data