Re: Kafka Indexing Service - Decoupling segments from consumer tasks
Great to hear that your experience has generally been positive!
What do you think about using compaction for this? (The feature added in
https://github.com/druid-io/druid/pull/5102.) The idea with compaction was
that it would enable a background process that goes through freshly
inserted segments and re-partitions them optimally.
For creating multiple datasources out of one topic, there is a PR wending
its way through review right now that is relevant: https://github.com/
On Wed, May 2, 2018 at 12:46 PM, Dylan Wylie <dylanwylie@xxxxxxxxx> wrote:
> Hey there,
> With the recent improvements to the Kafka Indexing Service we've been
> migrating over from Tranquility and have had a very positive experience.
> However one of the downsides to using the KIS, is that the number of
> segments generated for each period can't be smaller than the number of
> tasks required to consume the queue. So if you have a use case involving
> ingesting from a topic with a high rate of large messages but your spec
> only extracts a small proportion of fields you may be forced to run a large
> number of tasks that generate very small segments.
> This email is to check in for peoples thoughts on separating consuming and
> parsing messages from indexing and segment management, in a similar fashion
> to how Tranquility operates.
> Potentially - we could have the supervisor spawn two types of task that can
> be configured independently, a consumer and an appender. The consumer would
> parse the message based on the spec and then pass the results to the
> appropriate appender task which builds the segment. Another advantage to
> this approach is that it would allow creating multiple datasources from a
> single consumer group rather than ingesting the same topic multiple times.
> I'm quite new to the codebase so all thoughts and comments are welcome!
> Best regards,