Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> Chaning Multiple Reducers: Reduce -> Reduce -> Reduce


+
Jim Twensky 2012-10-05, 16:31
Copy link to this message
-
Re: Chaning Multiple Reducers: Reduce -> Reduce -> Reduce
Hey Jim,

Are you looking to re-sort or re-partition your data by a different
key or key combo after each output from reduce?

On Fri, Oct 5, 2012 at 10:01 PM, Jim Twensky <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I have a complex Hadoop job that iterates over  large graph data
> multiple times until some convergence condition is met. I know that
> the map output goes to the local disk of each particular mapper first,
> and then fetched by the reducers before the reduce tasks start. I can
> see that this is an overhead, and it theory we can ship the data
> directly from mappers to reducers, without serializing on the local
> disk first. I understand that this step is necessary for fault
> tolerance and it is an essential building block of MapReduce.
>
> In my application, the map process consists of identity mappers which
> read the input from HDFS and ship it to reducers. Essentially, what I
> am doing is applying chains of reduce jobs until the algorithm
> converges. My question is, can I bypass the serialization of the local
> data and ship it from mappers to reducers immediately (as soon as I
> call context.write() in my mapper class)? If not, are there any other
> MR platforms that can do this? I've been searching around and couldn't
> see anything similar to what I need. Hadoop On Line is a prototype and
> has some similar functionality but it hasn't been updated for a while.
>
> Note: I know about ChainMapper and ChainReducer classes but I don't
> want to chain multiple mappers in the same local node. I want to chain
> multiple reduce functions globally so the data flow looks like: Map ->
> Reduce -> Reduce -> Reduce, which means each reduce operation is
> followed by a shuffle and sort essentially bypassing the map
> operation.

--
Harsh J
+
Jim Twensky 2012-10-05, 17:43
+
Harsh J 2012-10-05, 17:54
+
Jim Twensky 2012-10-05, 18:02
+
Bertrand Dechoux 2012-10-08, 10:39
+
Fabio Pitzolu 2012-10-08, 10:44
+
Bertrand Dechoux 2012-10-08, 10:51
+
Jim Twensky 2012-10-08, 19:09
+
Michael Segel 2012-10-08, 19:19