Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce, mail # dev - running Combiner for all the task on the node.


+
Suresh S 2013-01-02, 12:23
+
Harsh J 2013-01-02, 17:48
+
Mahesh Balija 2013-01-02, 12:44
+
Mahesh Balija 2013-01-02, 12:50
+
Suresh S 2013-01-02, 13:04
Copy link to this message
-
Re: running Combiner for all the task on the node.
Mahesh Balija 2013-01-02, 14:06
Hi Suresh,

           I mean, in current approach combine phase will occur per mapper
instance only, but NOT per node.
           I think additional local shuffle and sort phase should happen so
that you can combine per node.

           But NOT really sure whether this can be achievable or not. You
can give a try and see whether you find any complications.

Thanks,
Mahesh Balija,
Calsoft Labs.

On Wed, Jan 2, 2013 at 6:34 PM, Suresh S <[EMAIL PROTECTED]> wrote:

> Definitely it will cost some overload. This lead to less intermediate
> movement and less time for reduce phase.  This benefit may improve the
> overall performance.
> *Regards*
> *S.Suresh,*
> *Research Scholar,*
> *Department of Computer Applications,*
> *National Institute of Technology,*
> *Tiruchirappalli - 620015.*
> *+91-9941506562*
>
>
> On Wed, Jan 2, 2013 at 6:20 PM, Mahesh Balija <[EMAIL PROTECTED]
> >wrote:
>
> > Continued,
> >
> > Also one more shuffle and sort phase should occur so that you can
> > merge/combine them properly.
> > So you should decide whether additional shuffle and sort phase will be
> > overhead in contrast with combine per node.
> >
> > Best,
> > Mahesh Balija,
> > Calsoft Labs.
> >
> > On Wed, Jan 2, 2013 at 6:14 PM, Mahesh Balija <
> [EMAIL PROTECTED]
> > >wrote:
> >
> > > Hi Suresh,
> > >
> > >                The combiner function will aggregate the data from a
> > single
> > > map instance. But NOT for all the maps running in a given node.
> > >                AFAIK As the maps will be running in the individual
> child
> > > JVMs, still the intermediate data need to be serialized (moved) so that
> > > your combiner can aggregate the data at Node level.
> > >
> > > Best,
> > > Mahesh Balija,
> > > Calsoft Labs.
> > >
> > >
> > >
> > > On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote:
> > >
> > >> Hello,
> > >>
> > >>       I think, running combiner function at node level (to combine all
> > the
> > >> map task output of the node) may reduce the intermediate data
> movement.
> > >>
> > >>      I don't know this technique is already available or not. Is it
> > worth
> > >> for working in this direction?
> > >> Any suggestions? Thanks in advance.
> > >> *Regards*
> > >> *S.Suresh,*
> > >> *Research Scholar,*
> > >> *Department of Computer Applications,*
> > >> *National Institute of Technology,*
> > >> *Tiruchirappalli - 620015.*
> > >> *+91-9941506562*
> > >>
> > >
> > >
> >
>