git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Kafka Indexing Service - Decoupling segments from consumer tasks


Hey there,

With the recent improvements to the Kafka Indexing Service we've been
migrating over from Tranquility and have had a very positive experience.

However one of the downsides to using the KIS, is that the number of
segments generated for each period can't be smaller than the number of
tasks required to consume the queue. So if you have a use case involving
ingesting from a topic with a high rate of large messages but your spec
only extracts a small proportion of fields you may be forced to run a large
number of tasks that generate very small segments.

This email is to check in for peoples thoughts on separating consuming and
parsing messages from indexing and segment management, in a similar fashion
to how Tranquility operates.

Potentially - we could have the supervisor spawn two types of task that can
be configured independently, a consumer and an appender. The consumer would
parse the message based on the spec and then pass the results to the
appropriate appender task which builds the segment. Another advantage to
this approach is that it would allow creating multiple datasources from a
single consumer group rather than ingesting the same topic multiple times.

I'm quite new to the codebase so all thoughts and comments are welcome!

Best regards,
Dylan


( ! ) Warning: include(msgfooter.php): failed to open stream: No such file or directory in /var/www/git/apache-druid-developers/msg00139.html on line 95
Call Stack
#TimeMemoryFunctionLocation
10.0007363576{main}( ).../msg00139.html:0

( ! ) Warning: include(): Failed opening 'msgfooter.php' for inclusion (include_path='.:/var/www/git') in /var/www/git/apache-druid-developers/msg00139.html on line 95
Call Stack
#TimeMemoryFunctionLocation
10.0007363576{main}( ).../msg00139.html:0