Also one more shuffle and sort phase should occur so that you can
merge/combine them properly.
So you should decide whether additional shuffle and sort phase will be
overhead in contrast with combine per node.
On Wed, Jan 2, 2013 at 6:14 PM, Mahesh Balija <[EMAIL PROTECTED]>wrote:
> Hi Suresh,
> The combiner function will aggregate the data from a single
> map instance. But NOT for all the maps running in a given node.
> AFAIK As the maps will be running in the individual child
> JVMs, still the intermediate data need to be serialized (moved) so that
> your combiner can aggregate the data at Node level.
> Mahesh Balija,
> Calsoft Labs.
> On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote:
>> I think, running combiner function at node level (to combine all the
>> map task output of the node) may reduce the intermediate data movement.
>> I don't know this technique is already available or not. Is it worth
>> for working in this direction?
>> Any suggestions? Thanks in advance.
>> *Research Scholar,*
>> *Department of Computer Applications,*
>> *National Institute of Technology,*
>> *Tiruchirappalli - 620015.*