-Re: running Combiner for all the task on the node.
Suresh S 2013-01-02, 13:04
Definitely it will cost some overload. This lead to less intermediate
movement and less time for reduce phase. This benefit may improve the
*Department of Computer Applications,*
*National Institute of Technology,*
*Tiruchirappalli - 620015.*
On Wed, Jan 2, 2013 at 6:20 PM, Mahesh Balija <[EMAIL PROTECTED]>wrote:
> Also one more shuffle and sort phase should occur so that you can
> merge/combine them properly.
> So you should decide whether additional shuffle and sort phase will be
> overhead in contrast with combine per node.
> Mahesh Balija,
> Calsoft Labs.
> On Wed, Jan 2, 2013 at 6:14 PM, Mahesh Balija <[EMAIL PROTECTED]
> > Hi Suresh,
> > The combiner function will aggregate the data from a
> > map instance. But NOT for all the maps running in a given node.
> > AFAIK As the maps will be running in the individual child
> > JVMs, still the intermediate data need to be serialized (moved) so that
> > your combiner can aggregate the data at Node level.
> > Best,
> > Mahesh Balija,
> > Calsoft Labs.
> > On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote:
> >> Hello,
> >> I think, running combiner function at node level (to combine all
> >> map task output of the node) may reduce the intermediate data movement.
> >> I don't know this technique is already available or not. Is it
> >> for working in this direction?
> >> Any suggestions? Thanks in advance.
> >> *Regards*
> >> *S.Suresh,*
> >> *Research Scholar,*
> >> *Department of Computer Applications,*
> >> *National Institute of Technology,*
> >> *Tiruchirappalli - 620015.*
> >> *+91-9941506562*