Re: Kafka Indexing Service - Decoupling segments from consumer tasks

Hey Dylan,

Great to hear that your experience has generally been positive!

What do you think about using compaction for this? (The feature added in The idea with compaction was
that it would enable a background process that goes through freshly
inserted segments and re-partitions them optimally.

For creating multiple datasources out of one topic, there is a PR wending
its way through review right now that is relevant:

On Wed, May 2, 2018 at 12:46 PM, Dylan Wylie <dylanwylie@xxxxxxxxx> wrote:

> Hey there,
> With the recent improvements to the Kafka Indexing Service we've been
> migrating over from Tranquility and have had a very positive experience.
> However one of the downsides to using the KIS, is that the number of
> segments generated for each period can't be smaller than the number of
> tasks required to consume the queue. So if you have a use case involving
> ingesting from a topic with a high rate of large messages but your spec
> only extracts a small proportion of fields you may be forced to run a large
> number of tasks that generate very small segments.
> This email is to check in for peoples thoughts on separating consuming and
> parsing messages from indexing and segment management, in a similar fashion
> to how Tranquility operates.
> Potentially - we could have the supervisor spawn two types of task that can
> be configured independently, a consumer and an appender. The consumer would
> parse the message based on the spec and then pass the results to the
> appropriate appender task which builds the segment. Another advantage to
> this approach is that it would allow creating multiple datasources from a
> single consumer group rather than ingesting the same topic multiple times.
> I'm quite new to the codebase so all thoughts and comments are welcome!
> Best regards,
> Dylan