Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> ETL in face of column renames


Copy link to this message
-
ETL in face of column renames
dear list,

I have what I imagine is a standard setup: a web application generates
data in MySQL, which I want to analyze in Hadoop; I run a nightly
process to extract tables of interest, Avroize, and dump into HDFS.

This has worked great so far because the tools I'm using make it easy to
load a directory tree of Avros with the same schema.

The issue is what to do when schema changes occur in the SQL database. I
believe column additions and deletions are handled automatically by the
Avro loaders I'm using, but I need to deal with a column rename.

My thinking is: I could bake the table schemas at time of ETL into the
Avros, for historical record, but then manually copy that schema out as
a "master" schema and apply it to all Avros for which it's appropriate;
then when a column rename occurs, go back and edit the master schema.

I've never used an external schema before, so please correct if I
misunderstand how they work.

Anyone have wisdom to share on this topic? I'd love to hear from anyone
who has done this, or has a better solution.

-Mason
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB