Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Basic question on how reducer works


Copy link to this message
-
Re: Basic question on how reducer works
Hi Harsh,

Thanks for clarifying. I was in thought earlier that Partitioner is picking
the reducer.

My cluster setup provides options for multiple reducers so i want to know
when and in which scenario we have go for multiple reducers?

Cheers!
Manoj.

On Mon, Jul 9, 2012 at 11:27 PM, Harsh J <[EMAIL PROTECTED]> wrote:

> Manoj,
>
> Think of it this way, and you shouldn't be confused: A reducer == a
> partition.
>
> For (1) - Partitioners do not 'call' a reduce, just write the data
> with a proper partition ID. The reducer thats same as the partition
> ID, picks it up for itself later. This we have already explained
> earlier.
>
> For (2) - For what scenario do you _not_ want multiple reducers
> handling each partition uniquely, when it is possible to scale that
> way?
>
> On Mon, Jul 9, 2012 at 11:22 PM, Manoj Babu <[EMAIL PROTECTED]> wrote:
> > Hi,
> >
> > It would be more helpful, If you could more details for the below doubts.
> >
> > 1, How the partitioner knows which reducer needs to be called?
> > 2, When we are using more than one reducers, the output gets separated.
> > Actually for what scenario we have to go for multiple reducers?
> >
> > Cheers!
> > Manoj.
> >
> >
> >
> > On Mon, Jul 9, 2012 at 6:54 PM, Arun C Murthy <[EMAIL PROTECTED]>
> wrote:
> >>
> >> Robert,
> >>
> >> On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
> >>
> >> Hi,
> >>
> >> I have some questions related to basic functionality in Hadoop.
> >>
> >> 1. When a Mapper process the intermediate output data, how it knows how
> >> many partitions to do(how many reducers will be) and how much data to
> go in
> >> each  partition for each reducer ?
> >>
> >> 2. A JobTracker when assigns a task to a reducer, it will also specify
> the
> >> locations of intermediate output data where it should retrieve it right
> ?
> >> But how a reducer will know from each remote location with intermediate
> >> output what portion it has to retrieve only ?
> >>
> >>
> >> To add to Harsh's comment. Essentially the TT *knows* where the output
> of
> >> a given map-id/reduce-id pair is present via an output-file/index-file
> >> combination.
> >>
> >> Arun
> >>
> >> --
> >> Arun C. Murthy
> >> Hortonworks Inc.
> >> http://hortonworks.com/
> >>
> >>
> >
>
>
>
> --
> Harsh J
>