git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kafka Per-Partition Watermarks


Does your job perform a keyBy or broadcast that would result in data from different partitions being distributed among tasks?  If so, then that would be the cause. 

On Thu, Oct 4, 2018 at 12:58 PM Andrew Kowpak <andrew.kowpak@xxxxxxxxxxxx> wrote:
Hi all,

I apologize if this has been discussed to death in the past, but, I'm finding myself very confused, and google is not proving helpful.

Based on the documentation, I understand that if there are idle partitions in a kafka stream, watermarks will not advance for the entire application.  I was hoping that by setting parallelism = the number of partitions that I would be able to work around the issue, but, this didn't work.  I'm totally willing to accept the fact that if I have idle partitions, my windowed partitions won't work, however, I would really like to understand why setting the parallelism didn't work.  If someone can explain, or perhaps point me to documentation or code, it would be very much appreciated.

Thanks.

--
Andrew Kowpak P.Eng Sr. Software Engineer
(519)  489 2688 SSIMWAVE Inc.
402-140 Columbia Street West, Waterloo ON