git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Understanding GenerateSequence and SideInputs


Hi,

Not sure if this is useful for your use case but as you are using BQ with a changing schema the following may also be a interesting read ...

https://cloud.google.com/blog/big-data/2018/02/how-to-handle-mutating-json-schemas-in-a-streaming-pipeline-with-square-enix

Cheers

Reza



On Fri, May 25, 2018, 5:50 AM Raghu Angadi <rangadi@xxxxxxxxxx> wrote:

On Thu, May 24, 2018 at 1:11 PM Carlos Alonso <carlos@xxxxxxxxxxxxx> wrote:
Hi everyone!!

I'm building a pipeline to store streaming data into BQ and I'm using the pattern: Slowly changing lookup cache described here: https://cloud.google.com/blog/big-data/2017/06/guide-to-common-cloud-dataflow-use-case-patterns-part-1 to hold and refresh the table schemas (as they may change from time to time).

Now I'd like to understand how that is scheduled on a distributed system. Who is running that code? One random node? One node but always the same? All nodes?

GenerateSequence() is uses an unbounded source. Like any unbounded source, it can has a set of 'splits' ('desiredNumSplits' [1] is set by runtime). Each of the splits run in parallel.. a typical runtime distributes these across the workers. Typically they stay on a worker unless there is a reason to redistribute (autoscaling, workers unresponsive etc). W.r.t. user application there are no guarantees about affinity. 

 

Also, what are the GenerateSequence guarantees in terms of precision? I have it configured to generate 1 element every 5 minutes and most of the time it works exact, but sometimes it doesn't... Is that expected?

Each of the splits mentioned above essentially runs 'advance() [2]' in a loop. It check current walltime to decide if it needs to emit next element. If the value you see off by a few seconds, it would imply 'advance()' was not called during that time by the framework. Runtime frameworks usually don't provide any hard or soft deadlines for scheduling work.



Regards