Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Hi,all. How can I involve two avro files with different schema into one M/R job?


Copy link to this message
-
Re: Hi,all. How can I involve two avro files with different schema into one M/R job?
Harsh J 2011-03-18, 18:31
On Fri, Mar 18, 2011 at 11:38 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Is that what you're after?  Why would you need this?

Probably a small case, in which I would require reading from multiple
sources in my job (perhaps even process them differently until the Map
phase), with special reader-schemas for each of my sources.

This could be custom-built easily, but I just wondered if general
use-cases of avro datafiles could benefit from such a thing.

Right now AvroJob.setInputSchema(...) sets given schema as
"avro.input.schema" in the Job, and my suggestion was to make it
something like /path/1+avro.input.schema, /path/2+avro.input.schema so
that each instantiated record reader for mappers (via MultipleInputs)
can pick up its own special reader schema (since they get a /path/2
via FileSplit).

--
Harsh J
http://harshj.com