git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Error restoring from savepoint while there's no modification to the job


Hi Averell,

Do you try to scale the job up, meaning do you increase the job
parallelism? Have you increased the job max parallelism by chance? If so
this is not supported. The max parallelism parameter is used to create
key groups that can be further assigned to parallel operators. This
parameter cannot be changed for a job that shall be restored.

If this is not the case, maybe Stefan(cc) have some ideas, what can go
wrong.

Best,

Dawid


On 10/10/18 09:23, Averell wrote:
> Hi everyone,
>
> I'm getting the following error when trying to restore from a savepoint.
> Here below is the output from flink bin, and in the attachment is a TM log.
> I didn't have any change in the app before and after savepoint. All Window
> operators have been assigned unique ID string.
>
> Could you please help give a look?
>
> Thanks and best regards,
> Averell
>
> taskmanager.gz
> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/taskmanager.gz>  
>
> org.apache.flink.client.program.ProgramInvocationException: Job failed.
> (JobID: 606ad5239f5e23cedb85d3e75bf76463)
> 	at
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
> 	at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487)
> 	at
> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
> 	at
> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:664)
> 	at
> com.nbnco.csa.analysis.copper.sdc.flink.StreamingSdcWithAverageByDslam$.main(StreamingSdcWithAverageByDslam.scala:442)
> 	at
> com.nbnco.csa.analysis.copper.sdc.flink.StreamingSdcWithAverageByDslam.main(StreamingSdcWithAverageByDslam.scala)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at
> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
> 	at
> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
> 	at
> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
> 	at
> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
> 	at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
> 	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
> 	at
> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
> 	at
> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
> 	at java.security.AccessController.doPrivileged(Native Method)
> 	at javax.security.auth.Subject.doAs(Subject.java:422)
> 	at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> 	at
> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
> 	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
> execution failed.
> 	at
> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
> 	at
> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
> 	... 22 more
> Caused by: java.lang.Exception: Exception while creating
> StreamOperatorStateContext.
> 	at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:192)
> 	at
> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
> 	at
> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
> 	at
> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
> 	at java.lang.Thread.run(Thread.java:748)
> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed
> state backend for WindowOperator_b7287b12f90aa788ab162856424c6d40_(8/64)
> from any of the 1 provided restore options.
> 	at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
> 	at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:279)
> 	at
> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:133)
> 	... 5 more
> Caused by: java.lang.IllegalStateException: Unexpected key-group in restore.
> 	at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
> 	at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:475)
> 	at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:438)
> 	at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:377)
> 	at
> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:105)
> 	at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
> 	at
> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
>
>
>
> --
> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/


Attachment: signature.asc
Description: OpenPGP digital signature