Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Combiner Execution


Copy link to this message
-
Re: Combiner Execution
Hi,

I'm working in node-level aggregation for MapReduce. Please check the
JIRA as follows:
https://issues.apache.org/jira/browse/MAPREDUCE-4502
I'm waiting for the review by community.

And it also can be implemented in Tez as Bikas and Gopal mentioned.

Thanks,

On Wed, Oct 23, 2013 at 1:28 AM, Bikas Saha <[EMAIL PROTECTED]> wrote:
> +1. A node level or rack level or any level intermediate combiner is
> fairly straightforward to add in Tez. Please carry over your question to
> the Apache Tez dev mailing list [EMAIL PROTECTED] if you are
> interested in following that path.
>
> Bikas
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Gopal Vijayaraghavan
> Sent: Tuesday, October 22, 2013 9:03 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Combiner Execution
>
> Hi,
>
> I'll answer your questions in reverse.
>
>> According to http://developer.yahoo.com/hadoop/tutorial/module4.html the
> output is already combined over all Mappers in a node. But we can not find
> how this is happening. Can someone point us to where this combiner is
> executed?
>
> You'll find the Combiner runner somewhere buried inside MapTask.java, hunt
> for the combinerRunner in there.
>
> The Combiner only combines the output of a single map-task (after
> sorting). This kicks in only if there are spills in that 1 map-task >
> minSpillsForCombine.
>
> It does not do any cross-task actions and the MR framework (as it is
> today) doesn't leave enough room for scheduling a cross-task activity (i.e
> MR is strictly bi-partite).
>
>> For a class project my group and I are looking to experiment with
> combining the output from Mappers on the same node or in the same rack. We
> found the idea at http://wiki.apache.org/hadoop/HadoopResearchProjects.
>
> Your general idea is sort of chalked out in Apache Tez (per-host/per-rack
> multi-level combiner trees, which is designed to be more flexible with its
> plumbing) -
> https://issues.apache.org/jira/browse/TEZ-145
>
> Cheers,
> Gopal
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified
> that any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender
> immediately and delete it from your system. Thank You.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

--
- Tsuyoshi
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB