Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> Hi,all. How can I involve two avro files with different schema into one M/R job?


Copy link to this message
-
Re: Hi,all. How can I involve two avro files with different schema into one M/R job?
On Fri, Mar 18, 2011 at 11:38 PM, Doug Cutting <[EMAIL PROTECTED]> wrote:
> Is that what you're after?  Why would you need this?

Probably a small case, in which I would require reading from multiple
sources in my job (perhaps even process them differently until the Map
phase), with special reader-schemas for each of my sources.

This could be custom-built easily, but I just wondered if general
use-cases of avro datafiles could benefit from such a thing.

Right now AvroJob.setInputSchema(...) sets given schema as
"avro.input.schema" in the Job, and my suggestion was to make it
something like /path/1+avro.input.schema, /path/2+avro.input.schema so
that each instantiated record reader for mappers (via MultipleInputs)
can pick up its own special reader schema (since they get a /path/2
via FileSplit).

--
Harsh J
http://harshj.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB