|
|
-
running Combiner for all the task on the node.
Suresh S 2013-01-02, 12:23
Hello,
I think, running combiner function at node level (to combine all the map task output of the node) may reduce the intermediate data movement.
I don't know this technique is already available or not. Is it worth for working in this direction? Any suggestions? Thanks in advance. *Regards* *S.Suresh,* *Research Scholar,* *Department of Computer Applications,* *National Institute of Technology,* *Tiruchirappalli - 620015.* *+91-9941506562*
+
Suresh S 2013-01-02, 12:23
-
Re: running Combiner for all the task on the node.
Harsh J 2013-01-02, 17:48
Hi Suresh, This has been in good progress for a while, please see https://issues.apache.org/jira/browse/MAPREDUCE-4502 and sub-tasks. On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote: > Hello, > > I think, running combiner function at node level (to combine all the > map task output of the node) may reduce the intermediate data movement. > > I don't know this technique is already available or not. Is it worth > for working in this direction? > Any suggestions? Thanks in advance. > *Regards* > *S.Suresh,* > *Research Scholar,* > *Department of Computer Applications,* > *National Institute of Technology,* > *Tiruchirappalli - 620015.* > *+91-9941506562* -- Harsh J
+
Harsh J 2013-01-02, 17:48
-
Re: running Combiner for all the task on the node.
Mahesh Balija 2013-01-02, 12:44
Hi Suresh,
The combiner function will aggregate the data from a single map instance. But NOT for all the maps running in a given node. AFAIK As the maps will be running in the individual child JVMs, still the intermediate data need to be serialized (moved) so that your combiner can aggregate the data at Node level.
Best, Mahesh Balija, Calsoft Labs. On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote:
> Hello, > > I think, running combiner function at node level (to combine all the > map task output of the node) may reduce the intermediate data movement. > > I don't know this technique is already available or not. Is it worth > for working in this direction? > Any suggestions? Thanks in advance. > *Regards* > *S.Suresh,* > *Research Scholar,* > *Department of Computer Applications,* > *National Institute of Technology,* > *Tiruchirappalli - 620015.* > *+91-9941506562* >
+
Mahesh Balija 2013-01-02, 12:44
-
Re: running Combiner for all the task on the node.
Mahesh Balija 2013-01-02, 12:50
Continued,
Also one more shuffle and sort phase should occur so that you can merge/combine them properly. So you should decide whether additional shuffle and sort phase will be overhead in contrast with combine per node.
Best, Mahesh Balija, Calsoft Labs.
On Wed, Jan 2, 2013 at 6:14 PM, Mahesh Balija <[EMAIL PROTECTED]>wrote:
> Hi Suresh, > > The combiner function will aggregate the data from a single > map instance. But NOT for all the maps running in a given node. > AFAIK As the maps will be running in the individual child > JVMs, still the intermediate data need to be serialized (moved) so that > your combiner can aggregate the data at Node level. > > Best, > Mahesh Balija, > Calsoft Labs. > > > > On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote: > >> Hello, >> >> I think, running combiner function at node level (to combine all the >> map task output of the node) may reduce the intermediate data movement. >> >> I don't know this technique is already available or not. Is it worth >> for working in this direction? >> Any suggestions? Thanks in advance. >> *Regards* >> *S.Suresh,* >> *Research Scholar,* >> *Department of Computer Applications,* >> *National Institute of Technology,* >> *Tiruchirappalli - 620015.* >> *+91-9941506562* >> > >
+
Mahesh Balija 2013-01-02, 12:50
-
Re: running Combiner for all the task on the node.
Suresh S 2013-01-02, 13:04
Definitely it will cost some overload. This lead to less intermediate movement and less time for reduce phase. This benefit may improve the overall performance. *Regards* *S.Suresh,* *Research Scholar,* *Department of Computer Applications,* *National Institute of Technology,* *Tiruchirappalli - 620015.* *+91-9941506562* On Wed, Jan 2, 2013 at 6:20 PM, Mahesh Balija <[EMAIL PROTECTED]>wrote:
> Continued, > > Also one more shuffle and sort phase should occur so that you can > merge/combine them properly. > So you should decide whether additional shuffle and sort phase will be > overhead in contrast with combine per node. > > Best, > Mahesh Balija, > Calsoft Labs. > > On Wed, Jan 2, 2013 at 6:14 PM, Mahesh Balija <[EMAIL PROTECTED] > >wrote: > > > Hi Suresh, > > > > The combiner function will aggregate the data from a > single > > map instance. But NOT for all the maps running in a given node. > > AFAIK As the maps will be running in the individual child > > JVMs, still the intermediate data need to be serialized (moved) so that > > your combiner can aggregate the data at Node level. > > > > Best, > > Mahesh Balija, > > Calsoft Labs. > > > > > > > > On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote: > > > >> Hello, > >> > >> I think, running combiner function at node level (to combine all > the > >> map task output of the node) may reduce the intermediate data movement. > >> > >> I don't know this technique is already available or not. Is it > worth > >> for working in this direction? > >> Any suggestions? Thanks in advance. > >> *Regards* > >> *S.Suresh,* > >> *Research Scholar,* > >> *Department of Computer Applications,* > >> *National Institute of Technology,* > >> *Tiruchirappalli - 620015.* > >> *+91-9941506562* > >> > > > > >
+
Suresh S 2013-01-02, 13:04
-
Re: running Combiner for all the task on the node.
Mahesh Balija 2013-01-02, 14:06
Hi Suresh,
I mean, in current approach combine phase will occur per mapper instance only, but NOT per node. I think additional local shuffle and sort phase should happen so that you can combine per node.
But NOT really sure whether this can be achievable or not. You can give a try and see whether you find any complications.
Thanks, Mahesh Balija, Calsoft Labs.
On Wed, Jan 2, 2013 at 6:34 PM, Suresh S <[EMAIL PROTECTED]> wrote:
> Definitely it will cost some overload. This lead to less intermediate > movement and less time for reduce phase. This benefit may improve the > overall performance. > *Regards* > *S.Suresh,* > *Research Scholar,* > *Department of Computer Applications,* > *National Institute of Technology,* > *Tiruchirappalli - 620015.* > *+91-9941506562* > > > On Wed, Jan 2, 2013 at 6:20 PM, Mahesh Balija <[EMAIL PROTECTED] > >wrote: > > > Continued, > > > > Also one more shuffle and sort phase should occur so that you can > > merge/combine them properly. > > So you should decide whether additional shuffle and sort phase will be > > overhead in contrast with combine per node. > > > > Best, > > Mahesh Balija, > > Calsoft Labs. > > > > On Wed, Jan 2, 2013 at 6:14 PM, Mahesh Balija < > [EMAIL PROTECTED] > > >wrote: > > > > > Hi Suresh, > > > > > > The combiner function will aggregate the data from a > > single > > > map instance. But NOT for all the maps running in a given node. > > > AFAIK As the maps will be running in the individual > child > > > JVMs, still the intermediate data need to be serialized (moved) so that > > > your combiner can aggregate the data at Node level. > > > > > > Best, > > > Mahesh Balija, > > > Calsoft Labs. > > > > > > > > > > > > On Wed, Jan 2, 2013 at 5:53 PM, Suresh S <[EMAIL PROTECTED]> wrote: > > > > > >> Hello, > > >> > > >> I think, running combiner function at node level (to combine all > > the > > >> map task output of the node) may reduce the intermediate data > movement. > > >> > > >> I don't know this technique is already available or not. Is it > > worth > > >> for working in this direction? > > >> Any suggestions? Thanks in advance. > > >> *Regards* > > >> *S.Suresh,* > > >> *Research Scholar,* > > >> *Department of Computer Applications,* > > >> *National Institute of Technology,* > > >> *Tiruchirappalli - 620015.* > > >> *+91-9941506562* > > >> > > > > > > > > >
+
Mahesh Balija 2013-01-02, 14:06
|
|