Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Basic question on how reducer works


+
Harsh J 2012-07-09, 01:16
+
Grandl Robert 2012-07-09, 01:27
+
Pavan Kulkarni 2012-07-09, 02:56
+
Harsh J 2012-07-09, 03:38
+
Pavan Kulkarni 2012-07-09, 04:11
+
Grandl Robert 2012-07-08, 01:37
+
Harsh J 2012-07-08, 05:34
+
Arun C Murthy 2012-07-09, 13:24
+
Manoj Babu 2012-07-09, 17:52
+
Harsh J 2012-07-09, 17:57
+
Manoj Babu 2012-07-09, 18:07
Copy link to this message
-
Re: Basic question on how reducer works
Hi Manoj,

As Harsh said, we would almost always need multiple reducers. As each
reduce is potentially executed on a different core (same machine or a
different one), in most cases, we would want at least as many reduces as
the number of cores for maximum parallelism/performance.

Karthik

On Mon, Jul 9, 2012 at 11:07 AM, Manoj Babu <[EMAIL PROTECTED]> wrote:

> Hi Harsh,
>
> Thanks for clarifying. I was in thought earlier that Partitioner is
> picking the reducer.
>
> My cluster setup provides options for multiple reducers so i want to know
> when and in which scenario we have go for multiple reducers?
>
> Cheers!
> Manoj.
>
>
>
> On Mon, Jul 9, 2012 at 11:27 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> Manoj,
>>
>> Think of it this way, and you shouldn't be confused: A reducer == a
>> partition.
>>
>> For (1) - Partitioners do not 'call' a reduce, just write the data
>> with a proper partition ID. The reducer thats same as the partition
>> ID, picks it up for itself later. This we have already explained
>> earlier.
>>
>> For (2) - For what scenario do you _not_ want multiple reducers
>> handling each partition uniquely, when it is possible to scale that
>> way?
>>
>> On Mon, Jul 9, 2012 at 11:22 PM, Manoj Babu <[EMAIL PROTECTED]> wrote:
>> > Hi,
>> >
>> > It would be more helpful, If you could more details for the below
>> doubts.
>> >
>> > 1, How the partitioner knows which reducer needs to be called?
>> > 2, When we are using more than one reducers, the output gets separated.
>> > Actually for what scenario we have to go for multiple reducers?
>> >
>> > Cheers!
>> > Manoj.
>> >
>> >
>> >
>> > On Mon, Jul 9, 2012 at 6:54 PM, Arun C Murthy <[EMAIL PROTECTED]>
>> wrote:
>> >>
>> >> Robert,
>> >>
>> >> On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:
>> >>
>> >> Hi,
>> >>
>> >> I have some questions related to basic functionality in Hadoop.
>> >>
>> >> 1. When a Mapper process the intermediate output data, how it knows how
>> >> many partitions to do(how many reducers will be) and how much data to
>> go in
>> >> each  partition for each reducer ?
>> >>
>> >> 2. A JobTracker when assigns a task to a reducer, it will also specify
>> the
>> >> locations of intermediate output data where it should retrieve it
>> right ?
>> >> But how a reducer will know from each remote location with intermediate
>> >> output what portion it has to retrieve only ?
>> >>
>> >>
>> >> To add to Harsh's comment. Essentially the TT *knows* where the output
>> of
>> >> a given map-id/reduce-id pair is present via an output-file/index-file
>> >> combination.
>> >>
>> >> Arun
>> >>
>> >> --
>> >> Arun C. Murthy
>> >> Hortonworks Inc.
>> >> http://hortonworks.com/
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Harsh J
>>
>
>
+
Grandl Robert 2012-07-09, 19:55
+
Arun C Murthy 2012-07-09, 20:33
+
Grandl Robert 2012-07-10, 03:15
+
Karthik Kambatla 2012-07-10, 03:33
+
Subir S 2012-07-10, 15:29
+
Harsh J 2012-07-14, 06:08
+
Subir S 2012-07-14, 12:00
+
Harsh J 2012-07-14, 13:55
+
Subir S 2012-07-16, 20:31
+
Subir S 2012-07-14, 05:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB