Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Avro mapred: How to avoid schema specification in job.xml?


Copy link to this message
-
Avro mapred: How to avoid schema specification in job.xml?
Hello,

I have been using avro with hadoop and oozie for months now and I am very
happy with the results.

The only point I see as a limitation now is that we specify avro schemes in
workflow.xml (job.xml):
- avro.input.schema
- avro.output.schema
Since this info is already provided in Mapper/Reducer signatures, I see this
as redundant. The schema is also present in all my serialized files, which
means that the schema is specified in 3 different places.

>From a run point of view, this is a pain, since any schema modification
(let's say a simple optional field added) forces me to update many job
files. This task is very error prone and since we have a large amount of
jobs, it generates a lot of work.

The only solution I see now would be to find/replace in the build script,
but I hope I could find a better solution by providing some generic schemes
to the job file, or find a way to deactivate schema validation in the job.
Any help will be appreciated!

--
Julien Muller
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB