I have a Kafka related challenge and hoping someone else has faced this or has some pointers. This is NOT a *schema registry* question, it is a question regarding the generation of schemas. I already know how I’m managing these schemas once they are created.

I need to manage potentially several hundred topics which are primarily sourced from sources in a relational database accessible via JDBC and there several hundred consumers which will subscribe to them.

There are always changes that happen to the relational schema and thus need to be made to the avro schema which is being used in the topic and the processors.
I have a few solutions in mind:

1. Use Spark-Avro from Databricks to load the tables into a dataframe and then write using avro format, which then I have as a starting point.

2. Use Avro-SQL from Landoop -- but not sure if I need to have an existing table or if I can just give it arbitrary SQL.

3. Use other tools such as csv to avro, json to avro, but for each I need to do some preprocessing to create JSON to Avro, etc.

4. Any other options?

Goal is to walk through the tables in the database, review the metadata and generate Avro schemas, which would then be versioned / managed elsewhere. If there are changes to the topic group in general, we'd be automatically deleting/ adding topics to Kafka. I just don't want to task the team with manualy creating these avro schemas / topics.

If I'm going about it completely outside of left field, let me know.


Rahul Singh

Anant Corporation
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB