git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Error restoring from savepoint while there's no modification to the job


Hi,

adding to Dawids questions, it would also be very helpful to know which Flink version was used to create the savepoint, which Flink version was used in the restore attempt, if the savepoint was moved or modified. Outside of potential conflicts with those things, I would not expect anything like this.

Best,
Stefan

> On 10. Oct 2018, at 09:51, Dawid Wysakowicz <dwysakowicz@xxxxxxxxxx> wrote:
> 
> Hi Averell,
> 
> Do you try to scale the job up, meaning do you increase the job
> parallelism? Have you increased the job max parallelism by chance? If so
> this is not supported. The max parallelism parameter is used to create
> key groups that can be further assigned to parallel operators. This
> parameter cannot be changed for a job that shall be restored.
> 
> If this is not the case, maybe Stefan(cc) have some ideas, what can go
> wrong.
> 
> Best,
> 
> Dawid
> 
> 
> On 10/10/18 09:23, Averell wrote:
>> Hi everyone,
>> 
>> I'm getting the following error when trying to restore from a savepoint.
>> Here below is the output from flink bin, and in the attachment is a TM log.
>> I didn't have any change in the app before and after savepoint. All Window
>> operators have been assigned unique ID string.
>> 
>> Could you please help give a look?
>> 
>> Thanks and best regards,
>> Averell
>> 
>> taskmanager.gz
>> <http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/file/t1586/taskmanager.gz>  
>> 
>> org.apache.flink.client.program.ProgramInvocationException: Job failed.
>> (JobID: 606ad5239f5e23cedb85d3e75bf76463)
>> 	at
>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:268)
>> 	at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:487)
>> 	at
>> org.apache.flink.streaming.api.environment.StreamContextEnvironment.execute(StreamContextEnvironment.java:66)
>> 	at
>> org.apache.flink.streaming.api.scala.StreamExecutionEnvironment.execute(StreamExecutionEnvironment.scala:664)
>> 	at
>> com.nbnco.csa.analysis.copper.sdc.flink.StreamingSdcWithAverageByDslam$.main(StreamingSdcWithAverageByDslam.scala:442)
>> 	at
>> com.nbnco.csa.analysis.copper.sdc.flink.StreamingSdcWithAverageByDslam.main(StreamingSdcWithAverageByDslam.scala)
>> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>> 	at
>> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>> 	at
>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>> 	at java.lang.reflect.Method.invoke(Method.java:498)
>> 	at
>> org.apache.flink.client.program.PackagedProgram.callMainMethod(PackagedProgram.java:529)
>> 	at
>> org.apache.flink.client.program.PackagedProgram.invokeInteractiveModeForExecution(PackagedProgram.java:421)
>> 	at
>> org.apache.flink.client.program.ClusterClient.run(ClusterClient.java:427)
>> 	at
>> org.apache.flink.client.cli.CliFrontend.executeProgram(CliFrontend.java:813)
>> 	at org.apache.flink.client.cli.CliFrontend.runProgram(CliFrontend.java:287)
>> 	at org.apache.flink.client.cli.CliFrontend.run(CliFrontend.java:213)
>> 	at
>> org.apache.flink.client.cli.CliFrontend.parseParameters(CliFrontend.java:1050)
>> 	at
>> org.apache.flink.client.cli.CliFrontend.lambda$main$11(CliFrontend.java:1126)
>> 	at java.security.AccessController.doPrivileged(Native Method)
>> 	at javax.security.auth.Subject.doAs(Subject.java:422)
>> 	at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
>> 	at
>> org.apache.flink.runtime.security.HadoopSecurityContext.runSecured(HadoopSecurityContext.java:41)
>> 	at org.apache.flink.client.cli.CliFrontend.main(CliFrontend.java:1126)
>> Caused by: org.apache.flink.runtime.client.JobExecutionException: Job
>> execution failed.
>> 	at
>> org.apache.flink.runtime.jobmaster.JobResult.toJobExecutionResult(JobResult.java:146)
>> 	at
>> org.apache.flink.client.program.rest.RestClusterClient.submitJob(RestClusterClient.java:265)
>> 	... 22 more
>> Caused by: java.lang.Exception: Exception while creating
>> StreamOperatorStateContext.
>> 	at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:192)
>> 	at
>> org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:250)
>> 	at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:738)
>> 	at
>> org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:289)
>> 	at org.apache.flink.runtime.taskmanager.Task.run(Task.java:711)
>> 	at java.lang.Thread.run(Thread.java:748)
>> Caused by: org.apache.flink.util.FlinkException: Could not restore keyed
>> state backend for WindowOperator_b7287b12f90aa788ab162856424c6d40_(8/64)
>> from any of the 1 provided restore options.
>> 	at
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:137)
>> 	at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:279)
>> 	at
>> org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:133)
>> 	... 5 more
>> Caused by: java.lang.IllegalStateException: Unexpected key-group in restore.
>> 	at org.apache.flink.util.Preconditions.checkState(Preconditions.java:195)
>> 	at
>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.readStateHandleStateData(HeapKeyedStateBackend.java:475)
>> 	at
>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restorePartitionedState(HeapKeyedStateBackend.java:438)
>> 	at
>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:377)
>> 	at
>> org.apache.flink.runtime.state.heap.HeapKeyedStateBackend.restore(HeapKeyedStateBackend.java:105)
>> 	at
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:151)
>> 	at
>> org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:123)
>> 
>> 
>> 
>> --
>> Sent from: http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/
> 
>