git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Joining more than 2 streams


Hi,
I want to do window join on multiple Kafka streams (say a, b, c) on common field in all 3 streams and apply some custom function on joined stream. As I understand we can join only 2 streams at a time via DataStream api. So may be I need to join a and b first and then join first joined stream with c. I want to understand how would stream state be stored in backend? Since I will be joining a and b stream first, I believe both streams will be stored in state backend for window time. And then again join of first joined stream (of a and b) with c will result storage of all 3 streams for windowed period. Does that mean stream a and b are stored twice in state backend? 

Let's say instead of using inbuilt join api, if I rather union all 3 streams (after transforming them to common schema) and keyBy stream on common field and apply process function where I implement joining on my own and store streams in some state backend, will that be more storage efficient as I will be saving 3 streams just once instead of twice? 

Gagan