git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Kafka Per-Partition Watermarks


Yes, my job does do a keyBy.  It never occurred to me that keyBy would distributed data from different partitions to different tasks, but, now that you mention it, it actually makes perfect sense.  Thanks you for the help.

On Thu, Oct 4, 2018 at 5:11 PM Elias Levy <fearsome.lucidity@xxxxxxxxx> wrote:
Does your job perform a keyBy or broadcast that would result in data from different partitions being distributed among tasks?  If so, then that would be the cause. 

On Thu, Oct 4, 2018 at 12:58 PM Andrew Kowpak <andrew.kowpak@xxxxxxxxxxxx> wrote:
Hi all,

I apologize if this has been discussed to death in the past, but, I'm finding myself very confused, and google is not proving helpful.

Based on the documentation, I understand that if there are idle partitions in a kafka stream, watermarks will not advance for the entire application.  I was hoping that by setting parallelism = the number of partitions that I would be able to work around the issue, but, this didn't work.  I'm totally willing to accept the fact that if I have idle partitions, my windowed partitions won't work, however, I would really like to understand why setting the parallelism didn't work.  If someone can explain, or perhaps point me to documentation or code, it would be very much appreciated.

Thanks.

--
Andrew Kowpak P.Eng Sr. Software Engineer
(519)  489 2688 SSIMWAVE Inc.
402-140 Columbia Street West, Waterloo ON



--
Andrew Kowpak P.Eng Sr. Software Engineer
(519)  489 2688 SSIMWAVE Inc.
402-140 Columbia Street West, Waterloo ON