Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Avro mapred: How to avoid schema specification in job.xml?


Copy link to this message
-
Re: Avro mapred: How to avoid schema specification in job.xml?
Scott Carey 2011-10-10, 18:09
I'm not all that familiar with how Oozie interacts with Avro.

The Job must set its avro.input.schema and avro.output.schema properties ‹
this can be done in code (see the unit tests in the Avro mapred project for
examples), and if you are using SpecificRecords and DataFiles the schema is
available to the code where necessary.

On 10/10/11 5:41 AM, "Julien Muller" <[EMAIL PROTECTED]> wrote:

> Hello,
>
> I have been using avro with hadoop and oozie for months now and I am very
> happy with the results.
>
> The only point I see as a limitation now is that we specify avro schemes in
> workflow.xml (job.xml):
> - avro.input.schema
> - avro.output.schema
> Since this info is already provided in Mapper/Reducer signatures, I see this
> as redundant. The schema is also present in all my serialized files, which
> means that the schema is specified in 3 different places.
>
> From a run point of view, this is a pain, since any schema modification (let's
> say a simple optional field added) forces me to update many job files. This
> task is very error prone and since we have a large amount of jobs, it
> generates a lot of work.
>
> The only solution I see now would be to find/replace in the build script, but
> I hope I could find a better solution by providing some generic schemes to the
> job file, or find a way to deactivate schema validation in the job. Any help
> will be appreciated!
>
> --
> Julien Muller