git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Flink application down due to RpcTimeout exception


Hi All,
I`m running flink1.6 on yarn,after the program run for a day, the flink program fails on yarn, and the error log is as follows:
It seems that it is due to a timeout error. But I have the following questions:
1. In which step the flink components communicate failed? What are the two components? 
2. How to solve this problem?
Thanks a lot!!

java.lang.Exception: Cannot deploy task LeftOuterJoin(where: (=(id, article_id)), join: (id, created_time, article_score, PU, article_id, CU, CN)) -> select: (id, created_time, article_score, PU, CU, CN) (2/2) (d403002a7accc5133cf89a386ddc1dfb) - TaskManager (container_1532509321420_463249_01_000002 @ sh-bs-3-i1-hadoop-17-225 (dataPort=10459)) not responding after a rpcTimeout of 10000 ms
	at org.apache.flink.runtime.executiongraph.Execution.lambda$deploy$5(Execution.java:601) ~[flink-runtime_2.11-1.6.0.jar:1.6.0]
	at java.util.concurrent.CompletableFuture.uniWhenComplete(CompletableFuture.java:760) ~[na:1.8.0_65]
	at java.util.concurrent.CompletableFuture$UniWhenComplete.tryFire(CompletableFuture.java:736) ~[na:1.8.0_65]
	at java.util.concurrent.CompletableFuture$Completion.run(CompletableFuture.java:442) ~[na:1.8.0_65]
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[na:1.8.0_65]
	at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[na:1.8.0_65]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) ~[na:1.8.0_65]
	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) ~[na:1.8.0_65]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[na:1.8.0_65]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[na:1.8.0_65]
	at java.lang.Thread.run(Thread.java:745) ~[na:1.8.0_65]
Caused by: akka.pattern.AskTimeoutException: Ask timed out on [Actor[akka.tcp://flink@sh-bs-3-i1-hadoop-17-225:24213/user/taskmanager_0#-1762816591]] after [10000 ms]. Sender[null] sent message of type "org.apache.flink.runtime.rpc.messages.RemoteRpcInvocation".
	at akka.pattern.PromiseActorRef$$anonfun$1.apply$mcV$sp(AskSupport.scala:604) ~[akka-actor_2.11-2.4.20.jar:na]
	at akka.actor.Scheduler$$anon$4.run(Scheduler.scala:126) ~[akka-actor_2.11-2.4.20.jar:na]
	at scala.concurrent.Future$InternalCallbackExecutor$.unbatchedExecute(Future.scala:601) ~[scala-library-2.11.8.jar:na]
	at scala.concurrent.BatchingExecutor$class.execute(BatchingExecutor.scala:109) ~[scala-library-2.11.8.jar:na]
	at scala.concurrent.Future$InternalCallbackExecutor$.execute(Future.scala:599) ~[scala-library-2.11.8.jar:na]
	at akka.actor.LightArrayRevolverScheduler$TaskHolder.executeTask(LightArrayRevolverScheduler.scala:329) ~[akka-actor_2.11-2.4.20.jar:na]
	at akka.actor.LightArrayRevolverScheduler$$anon$4.executeBucket$1(LightArrayRevolverScheduler.scala:280) ~[akka-actor_2.11-2.4.20.jar:na]
	at akka.actor.LightArrayRevolverScheduler$$anon$4.nextTick(LightArrayRevolverScheduler.scala:284) ~[akka-actor_2.11-2.4.20.jar:na]
	at akka.actor.LightArrayRevolverScheduler$$anon$4.run(LightArrayRevolverScheduler.scala:236) ~[akka-actor_2.11-2.4.20.jar:na]
	... 1 common frames omitted


Best,
Henry