git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Flink 1.5.2 - excessive ammount of container requests, Received new/Returning excess container "flood"


Hey guys,

thanks for the replies.
1. "Requesting new TaskExecutor" looks fine as it's exactly 32 as is jobs'
parallelism set.
The weird thing is that after those 32 containers requested and received we
have this "flood" of 'Received new container/Returning excess container`
(and as shown below it's actually doing something on YARN side)
Where does those come from?
2. I felt that DEBUG will be needed, we'll see what we can do about it.
3. Yes, all in favor for upgrading to 1.5.4. But as Gary mentioned there
seems to be no fixes that could heal it (I was reading release notes
previous to posting this thread ; )).
4. Hadoop: 2.6.0+cdh5.14.0

Here are logs for one of "excess" containers:
1. Flink JM
2018-10-09 17:35:33,493 INFO  org.apache.flink.yarn.YarnResourceManager                    
- Received new container: container_e96_1538374332137_3071_01_2485560 -
Remaining pending container requests: 0
2018-10-09 17:35:33,493 INFO  org.apache.flink.yarn.YarnResourceManager                    
- Returning excess container container_e96_1538374332137_3071_01_2485560.
2. YARN
2018-10-09 17:35:33,283 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e96_1538374332137_3071_01_2485560 Container Transitioned from NEW
to ALLOCATED
2018-10-09 17:35:33,283 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=user       
OPERATION=AM Allocated Container        TARGET=SchedulerApp    
RESULT=SUCCESS  APPID=application_1538374332137_3071      
CONTAINERID=container_e96_1538374332137_3071_01_2485560
2018-10-09 17:35:33,283 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Assigned container container_e96_1538374332137_3071_01_2485560 of capacity
<memory:6144, vCores:1> on host server:44142, which has 5 containers,
<memory:30720, vCores:5> used and <memory:2048, vCores:11> available after
allocation
2018-10-09 17:35:33,283 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
assignedContainer application attempt=appattempt_1538374332137_3071_000001
container=Container: [ContainerId:
container_e96_1538374332137_3071_01_2485560, NodeId: server:44142,
NodeHttpAddress: server:8042, Resource: <memory:6144, vCores:1>, Priority:
0, Token: null, ] queue=queue: capacity=0.5, absoluteCapacity=0.5,
usedResources=<memory:2353152, vCores:383>, usedCapacity=1.9947916,
absoluteUsedCapacity=0.9973958, numApps=2, numContainers=383
clusterResource=<memory:2359296, vCores:1152>
2018-10-09 17:35:33,485 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e96_1538374332137_3071_01_2485560 Container Transitioned from
ALLOCATED to ACQUIRED
2018-10-09 17:35:38,532 INFO
org.apache.hadoop.yarn.server.resourcemanager.rmcontainer.RMContainerImpl:
container_e96_1538374332137_3071_01_2485560 Container Transitioned from
ACQUIRED to RELEASED
2018-10-09 17:35:38,532 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.common.fica.FiCaSchedulerApp:
Completed container: container_e96_1538374332137_3071_01_2485560 in state:
RELEASED event:RELEASED
2018-10-09 17:35:38,532 INFO
org.apache.hadoop.yarn.server.resourcemanager.RMAuditLogger: USER=user       
IP=ip      OPERATION=AM Released Container TARGET=SchedulerApp    
RESULT=SUCCESS  APPID=application_1538374332137_3071      
CONTAINERID=container_e96_1538374332137_3071_01_2485560
2018-10-09 17:35:38,532 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.SchedulerNode:
Released container container_e96_1538374332137_3071_01_2485560 of capacity
<memory:6144, vCores:1> on host server:44142, which currently has 0
containers, <memory:0, vCores:0> used and <memory:32768, vCores:16>
available, release resources=true
2018-10-09 17:35:38,532 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.LeafQueue:
completedContainer container=Container: [ContainerId:
container_e96_1538374332137_3071_01_2485560, NodeId: server:44142,
NodeHttpAddress: server:8042, Resource: <memory:6144, vCores:1>, Priority:
0, Token: Token { kind: ContainerToken, service: ip:44142 }, ] queue=queue:
capacity=0.5, absoluteCapacity=0.5, usedResources=<memory:589824,
vCores:96>, usedCapacity=0.5, absoluteUsedCapacity=0.25, numApps=2,
numContainers=96 cluster=<memory:2359296, vCores:1152>
2018-10-09 17:35:38,532 INFO
org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler:
Application attempt appattempt_1538374332137_3071_000001 released container
container_e96_1538374332137_3071_01_2485560 on node: host: server:44142
#containers=0 available=32768 used=0 with event: RELEASED

Best regards,
Borys Gogulski



--
Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/