Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro, mail # user - Avro Map-Reduce and ChainMapper


Copy link to this message
-
Re: Avro Map-Reduce and ChainMapper
Jeremy Lewi 2012-02-09, 03:33
You might also want to take a look at
https://github.com/cloudera/crunch/

Not sure what state its in but judging by file names it might support
flume.

J

On Wed, Feb 8, 2012 at 10:04 AM, Scott Carey <[EMAIL PROTECTED]> wrote:

> I have not tried or tested ChainMapper with Avro myself.  It will probably
> work if you configure the input schemas or output schemas appropriately.
>  Take a look at what AvroJog.setInputSchema is doing, if you are familiar
> enough with hadoop's configuration you may be able to work it out.  Others
> likely know more than I do on this.
>
> Also, you may be interested in how things are done in this variation:
> https://github.com/wibidata/odiago-avro
>
>
> On 2/1/12 8:23 AM, "Andrew Kenworthy" <[EMAIL PROTECTED]> wrote:
>
> Hallo,
>
> Is it possible to chain Avro MR jobs using the ChainMapper? I'm looking
> to chain two map tasks and a reducer, but haven't been able to find any
> examples:
>
> Chain summary:
> a) first map task: takes non-avro input and produces K/V output in the
> form of AvroKey(Record), NullWritable
> b) second map task: taking output of first task as its input [mapper extends
> AvroMapper(Record, Pair(Record, NullWritable))]
> c) reducer: AvroReducer
>
> In particular, how would I specify the input and output schemas - simply
> calling AvroJob.setInputSchema/setOutputSchema on the individual chained
> job conf objects?
>
> Thanks,
>
> Andrew
>
>