Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Avro >> mail # user >> how to specify MultipleOutputs, MultipleInputs in using Avro mapred API

Copy link to this message
Re: how to specify MultipleOutputs, MultipleInputs in using Avro mapred API
If I got your issue right, all you need to ensure is that both your
mappers emit the same "type" of keys and values out. This can easily
be done by implementing a custom Avro Mapper [which reads records from
avro files, processes them and spews out legal K/V types instead of
avro datums, such that they match your HBase mapper's collected

Your reducer shouldn't be bothered about avro/etc then.

* Note: You may also use avro as intermediate K/V format, but it might
require some extra work to do so :)

On Wed, Aug 18, 2010 at 6:45 PM, ey-chih chow <[EMAIL PROTECTED]> wrote:
> Hi,
> Let me rephrase my question to see if anybody is interested in answering it.
>  For the new version of Avro 1.4.0, the class hierarchy of AvroMapper and
> AvroReducer have been changed to subclass from Configured, rather than from
> MapReduceBase to implement the interfaces Mapper and Reducer respectively.
>  The configuration of Avro mapred jobs are also different from that of the
> other mapred jobs.  Furthermore, text log files have to be imported to
> become Avro formats for Avro mapred jobs to process.  If I get a mapred job
> that requires a reducer-side join of a two inputs, one from HBase and the
> other from an imported log file with the Avro format, how can I configure
> the two mappers to process inputs from HBase and the log file respectively?
>  Also how can I configure an Avro reducer to generate multiple outputs?  For
> multiple inputs and outputs, I got some examples programs from Tom White's
> Hadoop book.  But I simply don't know what kind of changes I should make for
> the Avro case.
> Ey-Chih
> ________________________________
> Subject: how to specify MultipleOutputs, MultipleInputs in using Avro mapred
> Date: Mon, 16 Aug 2010 18:22:24 -0700
> Hi,
> I got a Map/Reduce job that require multiple inputs and outputs.  One of the
> inputs will be processed by a mapper and a reducer that are subclasses of
> AvroMapper/AvroReducer respectively.  And the reducer has multiple outputs.
>  I appreciate if anybody could let me know how to configure the job to do
> this.
> Ey-Chih

Harsh J