git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Apache Beam/Dataflow - using custom message id and timestamp


Hello beam team,

I'm reading from a Kafka topic and writing to Pubsub. The code constructs PubsubMessage objects from the Kafka messages

I'm setting some attributes for the PubsubMessage objects, namely message_id and message_timestamp2

I wish to use these attribute values downstream, so that subscribing dataflow jobs use the message_id for exactly once processing and the message_timestamp2 for windowing. Therefore, when I write to Pubsub, I do this
PubsubIO.writeMessages()
.withIdAttribute("message_id")
.withTimestampAttribute("message_timestamp2").to(options.getOutputTopic()));
However, instead of setting the values downstream with these attribute values, dataflow is updating them attributes with the default message_id and the process time.

What am I doing wrong?

Pythian

          love your data


Terry Dhariwal | Solutions Architect
t  +44 (0) 7460 877 412
w www.pythian.com

--