Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Re: Basic question on how reducer works


+
Harsh J 2012-07-09, 01:16
+
Grandl Robert 2012-07-09, 01:27
+
Pavan Kulkarni 2012-07-09, 02:56
+
Harsh J 2012-07-09, 03:38
+
Pavan Kulkarni 2012-07-09, 04:11
+
Grandl Robert 2012-07-08, 01:37
+
Harsh J 2012-07-08, 05:34
Copy link to this message
-
Re: Basic question on how reducer works
Robert,

On Jul 7, 2012, at 6:37 PM, Grandl Robert wrote:

> Hi,
>
> I have some questions related to basic functionality in Hadoop.
>
> 1. When a Mapper process the intermediate output data, how it knows how many partitions to do(how many reducers will be) and how much data to go in each  partition for each reducer ?
>
> 2. A JobTracker when assigns a task to a reducer, it will also specify the locations of intermediate output data where it should retrieve it right ? But how a reducer will know from each remote location with intermediate output what portion it has to retrieve only ?

To add to Harsh's comment. Essentially the TT *knows* where the output of a given map-id/reduce-id pair is present via an output-file/index-file combination.

Arun

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/
+
Manoj Babu 2012-07-09, 17:52
+
Harsh J 2012-07-09, 17:57
+
Manoj Babu 2012-07-09, 18:07
+
Karthik Kambatla 2012-07-09, 18:12
+
Grandl Robert 2012-07-09, 19:55
+
Arun C Murthy 2012-07-09, 20:33
+
Grandl Robert 2012-07-10, 03:15
+
Karthik Kambatla 2012-07-10, 03:33
+
Subir S 2012-07-10, 15:29
+
Harsh J 2012-07-14, 06:08
+
Subir S 2012-07-14, 12:00
+
Harsh J 2012-07-14, 13:55
+
Subir S 2012-07-16, 20:31
+
Subir S 2012-07-14, 05:49
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB