Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - ETL in face of column renames


Copy link to this message
-
ETL in face of column renames
Mason 2013-05-22, 17:34
dear list,

I have what I imagine is a standard setup: a web application generates
data in MySQL, which I want to analyze in Hadoop; I run a nightly
process to extract tables of interest, Avroize, and dump into HDFS.

This has worked great so far because the tools I'm using make it easy to
load a directory tree of Avros with the same schema.

The issue is what to do when schema changes occur in the SQL database. I
believe column additions and deletions are handled automatically by the
Avro loaders I'm using, but I need to deal with a column rename.

My thinking is: I could bake the table schemas at time of ETL into the
Avros, for historical record, but then manually copy that schema out as
a "master" schema and apply it to all Avros for which it's appropriate;
then when a column rename occurs, go back and edit the master schema.

I've never used an external schema before, so please correct if I
misunderstand how they work.

Anyone have wisdom to share on this topic? I'd love to hear from anyone
who has done this, or has a better solution.

-Mason