git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Understanding GenerateSequence and SideInputs


Hi everyone!!

I'm building a pipeline to store streaming data into BQ and I'm using the pattern: Slowly changing lookup cache described here: https://cloud.google.com/blog/big-data/2017/06/guide-to-common-cloud-dataflow-use-case-patterns-part-1 to hold and refresh the table schemas (as they may change from time to time).

Now I'd like to understand how that is scheduled on a distributed system. Who is running that code? One random node? One node but always the same? All nodes?

Also, what are the GenerateSequence guarantees in terms of precision? I have it configured to generate 1 element every 5 minutes and most of the time it works exact, but sometimes it doesn't... Is that expected?

Regards