[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Unable to start session cluster using Docker


I have used the docker-compose file for creating the cluster as shown in the documentation. The web ui is started successfully, however, the task managers are unable to join.

Job Manager container logs:

018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Rest endpoint listening at cluster:8081

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - http://cluster:8081 was granted leadership with leaderSessionID=00000000-0000-0000-0000-000000000000

2018-10-04 18:13:13,907 INFO  org.apache.flink.runtime.dispatcher.DispatcherRestEndpoint    - Web frontend listening at http://cluster:8081

2018-10-04 18:13:14,012 INFO  org.apache.flink.runtime.resourcemanager.StandaloneResourceManager  - ResourceManager akka.tcp://flink@cluster:6123/user/resourcemanager was granted leadership with fencing token 00000000000000000000000000000000

2018-10-04 18:13:14,013 INFO  org.apache.flink.runtime.resourcemanager.slotmanager.SlotManager  - Starting the SlotManager.

2018-10-04 18:13:14,026 INFO  org.apache.flink.runtime.dispatcher.StandaloneDispatcher      - Dispatcher akka.tcp://flink@cluster:6123/user/dispatcher was granted leadership with fencing token 00000000-0000-0000-0000-000000000000

Not sure why it says Web Frontend listening at cluster:8081 when the job manager rpc address is specified to jobmanager

Task Manager Container Logs:

018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Connecting to ResourceManager akka.tcp://flink@jobmanager:6123/user/resourcemanager(00000000000000000000000000000000).

2018-10-04 18:19:18,818 INFO  org.apache.flink.runtime.filecache.FileCache                  - User file cache uses directory /tmp/flink-dist-cache-1bd95c51-3031-42ab-b782-14a0023921e5

2018-10-04 18:19:28,850 INFO  org.apache.flink.runtime.taskexecutor.TaskExecutor            - Could not resolve ResourceManager address akka.tcp://flink@jobmanager:6123/user/resourcemanager, retrying in 10000 ms: Ask timed out on [ActorSelection[Anchor(akka.tcp://flink@jobmanager:6123/), Path(/user/resourcemanager)]] after [10000 ms]. Sender[null] sent message of type "".

I have even tried to set JOB_MANAGER_RPC_ADDRESS=cluster in   in docker-compose file, it does not work.
Even "cluster" and "jobmanager" points to localhost in /etc/hosts file.

Can you please let me know what is the issue here.

Vinay Patil