git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: connection failed when running flink in a cluster


Hi,

Can you try submitting with:

    ./bin/flink run examples/streaming/SocketWindowWordCount.jar --hostname <IP> --port 9000

where IP is the IP of the node where you started nc? If not specified, the
default hostname is localhost. This problematic is if the source operator is
scheduled on a different physical machine.

Best,
Gary

On Mon, Aug 6, 2018 at 6:01 PM, Felipe Gutierrez <felipe.o.gutierrez@xxxxxxxxx> wrote:
Hi Vino,

the UI shows the job as completed.
I had run "./bin/flink run examples/streaming/WordCount.jar" and I get no error.

When I start netcat "nc -l 9000" and in other terminal I run "./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000" I have this exception.

Starting execution of program

------------------------------------------------------------
 The program finished with the following exception:

org.apache.flink.client.program.ProgramInvocationException: java.net.ConnectException: Connection refused
at org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:264)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:464)
at org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
at org.apache.flink.streaming.examples.socket.SocketWindowWordCount.main(SocketWindowWordCount.java:92)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:528)
at org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:420)
at org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:404)
at org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:785)
at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:279)
at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:214)
at org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1025)
at org.apache.flink.client.cli.CliFrontend.lambda$main$9(CliFrontend.java:1101)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1836)
at org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1101)
Caused by: java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)



--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Mon, Aug 6, 2018 at 5:54 PM vino yang <yanghua1127@xxxxxxxxx> wrote:
Hi Felipe,

You got the result? And the web UI shown the job is completed?

If it throws the exception you provided, the job's status should be failed.

Thanks, vino.

2018-08-06 23:42 GMT+08:00 Felipe Gutierrez <felipe.o.gutierrez@xxxxxxxxx>:
yes. with this example (examples/streaming/WordCount.jar) my cluster worked.

the file log/*out from the master is still empty and the file log/*out from the slave node has my result. The dashboard also shows that the job is completed.

So, like you said there are some external dependencies that I didn`t include in my deploy. Do you have any clue?


Kind Regards,
Felipe




--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Mon, Aug 6, 2018 at 5:01 PM Gary Yao <gary@xxxxxxxxxxxxxxxxx> wrote:
Hi,

nc exits after the first connection is closed. Are you re-running the nc
command every time the job finishes?

The stacktrace you copied does not indicate that a TaskManager cannot connect
to the JobManager. I can only see that the SocketTextStreamFunction (from the
SocketWindowWordCount job?) cannot open the connection to the address that you
specified.

Can you try to run examples/streaming/WordCount.jar. It is a simpler job which
does not rely on external dependencies.

If all the above fails, can you tell us how you submit the job? Can you post
the full command? Can you also post the full JobManager & TaskManager logs?

Best,
Gary



On Mon, Aug 6, 2018 at 4:10 PM, Felipe Gutierrez <felipe.o.gutierrez@xxxxxxxxx> wrote:
do you mean "nc -l 9000"? If so, I did start before.
the task manager running on the master can connect to the job manager. but the task manager on the slave node cannot. The second time that I start the WordCount task it recognizes only one task manager (from the master) and runs my task. But the task manager from the slave does not process anything and it is started.

here is the error stack trace from the slave node:

2017-05-30 05:10:39,853 INFO  org.apache.flink.runtime.state.heap.HeapKeyedStateBackend     - Initializing heap keyed state backend with stream factory.
2017-05-30 05:10:39,977 INFO  org.apache.flink.runtime.taskmanager.Task                     - Source: Socket Stream -> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23) switched from RUNNING to FAILED.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)
2017-05-30 05:10:40,016 INFO  org.apache.flink.runtime.taskmanager.Task                     - Freeing task resources for Source: Socket Stream -> Flat Map (1/1) (d5e3d87395995d3977d2f472de896e23).


--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Mon, Aug 6, 2018 at 2:17 PM vino yang <yanghua1127@xxxxxxxxx> wrote:
Hi Felipe,

From the exception information, it seems that you did not start the socket server, the socket source needs to connect to the socket server.

Please make sure the socket server has started and is available.

Thanks, vino.

2018-08-06 18:45 GMT+08:00 Felipe Gutierrez <felipe.o.gutierrez@xxxxxxxxx>:
yes.

when I execute the jps command on the master node I see TaskManagerRunner and StandaloneSessionClusterEntrypoint (which I believe it is the  jobManager). On the slave nodes I see TaskManagerRunner when I run jps command


--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez


On Mon, Aug 6, 2018 at 12:13 PM miki haiat <miko5054@xxxxxxxxx> wrote:
Did you start job manager and task manager on the same resbery pi ?

On Mon, 6 Aug 2018, 12:01 Felipe Gutierrez, <felipe.o.gutierrez@xxxxxxxxx> wrote:
Hello everyone,

I am trying to run Flink on Raspberry Pis. My first test for word count in a single node worked. I just have to decrease the Heap memory of the jobmanager.heap.mb and taskmanager.heap.mb to 512.
My second test is to add 2 slave nodes I got the error: "Java HotSpot(TM) Client VM warning: G1 GC is disabled in this release." at the file log/flink-root-taskexecutor-0-*.out.

This link (https://blog.sflow.com/2016/06/raspberry-pi-real-time-network-analytics.html) says that in order to Raspberry Pi ARM architecture works with JVM it is necessary to configure the JVM as:
-Xms600M
-Xmx600M
-XX:+UseParNewGC
-XX:+UseConcMarkSweepGC
-XX:+CMSIncrementalMode

then I set this variables on the path inside the file flink-conf.yaml
env.java.opts: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
env.java.opts.jobmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
env.java.opts.taskmanager: "-XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"

and the error "Java HotSpot(TM) Client VM warning: G1 GC is disabled in this release." is not showing anymore. However, the connection from the master node to the slave node is still not possible. Does anybody know how I must configure flink to deal with that?

This is the error stack trace:

2017-05-25 12:40:26,421 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched from DEPLOYING to RUNNING.
2017-05-25 12:40:26,891 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Source: Socket Stream -> Flat Map (1/1) (b81b6492fc0860367be422d0b0bf4358) switched from RUNNING to FAILED.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)
2017-05-25 12:40:26,898 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Job Socket Window WordCount (71c6d7796eccf6587d9d1deda0490e09) switched from state RUNNING to FAILING.
java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at org.apache.flink.streaming.api.functions.source.SocketTextStreamFunction.run(SocketTextStreamFunction.java:96)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:87)
at org.apache.flink.streaming.api.operators.StreamSource.run(StreamSource.java:56)
at org.apache.flink.streaming.runtime.tasks.SourceStreamTask.run(SourceStreamTask.java:99)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:306)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:703)
at java.lang.Thread.run(Thread.java:745)
2017-05-25 12:40:26,921 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from RUNNING to CANCELING.
2017-05-25 12:40:26,975 INFO  org.apache.flink.runtime.executiongraph.ExecutionGraph        - Window(TumblingProcessingTimeWindows(5000), ProcessingTimeTrigger, ReduceFunction$1, PassThroughWindowFunction) -> Sink: Print to Std. Out (1/1) (aa1a0e7ee3a1d3ad8f99b2608bd64c5b) switched from CANCELING to CANCELED.



Thanks, Felipe
--
-- Felipe Gutierrez
-- skype: felipe.o.gutierrez