Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # user - Re: Basic question on how reducer works


+
Harsh J 2012-07-09, 01:16
+
Grandl Robert 2012-07-09, 01:27
+
Pavan Kulkarni 2012-07-09, 02:56
+
Harsh J 2012-07-09, 03:38
+
Pavan Kulkarni 2012-07-09, 04:11
+
Grandl Robert 2012-07-08, 01:37
+
Harsh J 2012-07-08, 05:34
+
Arun C Murthy 2012-07-09, 13:24
+
Manoj Babu 2012-07-09, 17:52
+
Harsh J 2012-07-09, 17:57
+
Manoj Babu 2012-07-09, 18:07
Copy link to this message
-
Re: Basic question on how reducer works
Karthik Kambatla 2012-07-09, 18:12
Hi Manoj,

As Harsh said, we would almost always need multiple reducers. As each
reduce is potentially executed on a different core (same machine or a
different one), in most cases, we would want at least as many reduces as
the number of cores for maximum parallelism/performance.

Karthik

On Mon, Jul 9, 2012 at 11:07 AM, Manoj Babu <[EMAIL PROTECTED]> wrote:

> Hi Harsh,
>
> Thanks for clarifying. I was in thought earlier that Partitioner is
> picking the reducer.
>
> My cluster setup provides options for multiple reducers so i want to know
> when and in which scenario we have go for multiple reducers?
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Jul 9, 2012 at 11:27 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Manoj,
>>
>> Think of it this way, and you shouldn't be confused: A reducer == a
>> partition.
>>
>> For (1) - Partitioners do not 'call' a reduce, just write the data
>> with a proper partition ID. The reducer thats same as the partition
>> ID, picks it up for itself later. This we have already explained
>> earlier.
>>
>> For (2) - For what scenario do you _not_ want multiple reducers
>> handling each partition uniquely, when it is possible to scale that
>> way?
>>
>> On Mon, Jul 9, 2012 at 11:22 PM, Manoj Babu <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > It would be more helpful, If you could more details for the below
>> doubts.
>> >
>> > 1, How the partitioner knows which reducer needs to be called?
>> > 2, When we are using more than one reducers, the output gets separated.
>> > Actually for what scenario we have to go for multiple reducers?
>> >
>> > Cheers!
>> > Manoj.
>> >
>> >
>> >
>> > On Mon, Jul 9, 2012 at 6:54 PM, Arun C Murthy <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> Robert,
>> >>
>> >> On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have some questions related to basic functionality in Hadoop.
>> >>
>> >> 1. When a Mapper process the intermediate output data, how it knows how
>> >> many partitions to do(how many reducers will be) and how much data to
>> go in
>> >> each  partition for each reducer ?
>> >>
>> >> 2. A JobTracker when assigns a task to a reducer, it will also specify
>> the
>> >> locations of intermediate output data where it should retrieve it
>> right ?
>> >> But how a reducer will know from each remote location with intermediate
>> >> output what portion it has to retrieve only ?
>> >>
>> >>
>> >> To add to Harsh's comment. Essentially the TT *knows* where the output
>> of
>> >> a given map-id/reduce-id pair is present via an output-file/index-file
>> >> combination.
>> >>
>> >> Arun
>> >>
>> >> --
>> >> Arun C. Murthy
>> >> Hortonworks Inc.
>> >> http://hortonworks.com/
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
+
Grandl Robert 2012-07-09, 19:55
+
Arun C Murthy 2012-07-09, 20:33
+
Grandl Robert 2012-07-10, 03:15
+
Karthik Kambatla 2012-07-10, 03:33
+
Subir S 2012-07-10, 15:29
+
Harsh J 2012-07-14, 06:08
+
Subir S 2012-07-14, 12:00
+
Harsh J 2012-07-14, 13:55
+
Subir S 2012-07-16, 20:31
+
Subir S 2012-07-14, 05:49