Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - ChainMapper and ChainReducer: Are the key/value pairs distributed to the nodes of the cluster before each Map phase?


Copy link to this message
-
Re: ChainMapper and ChainReducer: Are the key/value pairs distributed to the nodes of the cluster before each Map phase?
Rahul Jain 2011-04-29, 18:55
Your latter statement is correct:

> if the output of the Map1 phase (or Reduce phase) is immediately inserted
to Map2 phase (or Map3 Phase) within the same node, without any
distribution.

ChainMappers / ChainReducers are just convenience classes to allow reuse of
mapper code  whether executing as part of a sequence or executing
standalone. These do not force the system to do any additional distribution,
grouping, sorting etc.

-Rahul

2011/4/29 Panayotis Antonopoulos <[EMAIL PROTECTED]>

>
> Hello,
> Let' say we have a MR job that uses ChainMapper and ChainReducer like in
> the following diagram:
> Input->Map1->Map2->Reduce->Map3->Output
>
> The input is split and distributed to the nodes of the cluster before being
> processed by Map1 phase.
> Also, before the Reduce phase the key/value pairs are also distributed to
> the Reducers according to the Partitions made by the Partitioner.
>
> I expected that the same thing (distribution of the keys) would happen
> before Map2 and Map3 phases but after reading "Pro Hadoop" Book I strongly
> doubt it.
>
> I would like to ask you if the key/value pairs emitted by the Map1 phase
> (or those emitted by the Reduce phase) are distributed to the nodes of the
> cluster before being processed by the next Map phase,
> or if the output of the Map1 phase (or Reduce phase) is immediately
> inserted to Map2 phase (or Map3 Phase) within the same node, without any
> distribution.
>
> Thank you in advance!
> Panagiotis Antonopoulos
>