I think for this case a model that is similar to how the Streaming File Source works should be good. You can have a look at ContinuousFileMonitoringFunction and ContinuousFileReaderOperator. The idea is that the first emits splits that should be processed and the second is responsible for reading those splits. A generic version of that is what I'm proposing for the refactoring of our source interface  that also comes with a prototype implementation .
I think something like this should be adaptable to your case. The split enumerator would at first only emit file splits downstream, after that it would emit Kafka partitions that should be read. The split reader would understand both file splits and kafka partitions and can read from both. This still has some kinks to be worked out when it comes to watermarks, FLIP-27 is not finished.
What do you think?