-Re: ChainMapper and ChainReducer: Are the key/value pairs distributed to the nodes of the cluster before each Map phase?
Rahul Jain 2011-04-29, 18:55
Your latter statement is correct:
> if the output of the Map1 phase (or Reduce phase) is immediately inserted
to Map2 phase (or Map3 Phase) within the same node, without any
ChainMappers / ChainReducers are just convenience classes to allow reuse of
mapper code whether executing as part of a sequence or executing
standalone. These do not force the system to do any additional distribution,
grouping, sorting etc.
2011/4/29 Panayotis Antonopoulos <[EMAIL PROTECTED]>
> Let' say we have a MR job that uses ChainMapper and ChainReducer like in
> the following diagram:
> The input is split and distributed to the nodes of the cluster before being
> processed by Map1 phase.
> Also, before the Reduce phase the key/value pairs are also distributed to
> the Reducers according to the Partitions made by the Partitioner.
> I expected that the same thing (distribution of the keys) would happen
> before Map2 and Map3 phases but after reading "Pro Hadoop" Book I strongly
> doubt it.
> I would like to ask you if the key/value pairs emitted by the Map1 phase
> (or those emitted by the Reduce phase) are distributed to the nodes of the
> cluster before being processed by the next Map phase,
> or if the output of the Map1 phase (or Reduce phase) is immediately
> inserted to Map2 phase (or Map3 Phase) within the same node, without any
> Thank you in advance!
> Panagiotis Antonopoulos