git.net

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

kafka to hdfs flow avro schema evolution


Hi everybody,

We are planning to use flink for our kafka to hdfs ingestion. We are consuming avro messages encoded as json and then writing them to hdfs as parquet.  But our avro schema is changing time to time in a backward compatible way. And because of deployments you can see messages like

v1 v1 v1 v1 v2 v2 v1 v2 v1 ..... v2 v2 v2

i mean in the course of deployment there is a small time window in which messages can with mixed versions.  I just want to ask whether flink/avro/parquet can handle this scenario or is there something we need to do as well. Thanks in advance.