Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> Combiner Execution


Copy link to this message
-
Re: Combiner Execution
Hi,

I'm working in node-level aggregation for MapReduce. Please check the
JIRA as follows:
https://issues.apache.org/jira/browse/MAPREDUCE-4502
I'm waiting for the review by community.

And it also can be implemented in Tez as Bikas and Gopal mentioned.

Thanks,

On Wed, Oct 23, 2013 at 1:28 AM, Bikas Saha <[EMAIL PROTECTED]> wrote:
> +1. A node level or rack level or any level intermediate combiner is
> fairly straightforward to add in Tez. Please carry over your question to
> the Apache Tez dev mailing list [EMAIL PROTECTED] if you are
> interested in following that path.
>
> Bikas
>
> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]] On Behalf Of
> Gopal Vijayaraghavan
> Sent: Tuesday, October 22, 2013 9:03 AM
> To: [EMAIL PROTECTED]
> Subject: Re: Combiner Execution
>
> Hi,
>
> I'll answer your questions in reverse.
>
>> According to http://developer.yahoo.com/hadoop/tutorial/module4.html the
> output is already combined over all Mappers in a node. But we can not find
> how this is happening. Can someone point us to where this combiner is
> executed?
>
> You'll find the Combiner runner somewhere buried inside MapTask.java, hunt
> for the combinerRunner in there.
>
> The Combiner only combines the output of a single map-task (after
> sorting). This kicks in only if there are spills in that 1 map-task >
> minSpillsForCombine.
>
> It does not do any cross-task actions and the MR framework (as it is
> today) doesn't leave enough room for scheduling a cross-task activity (i.e
> MR is strictly bi-partite).
>
>> For a class project my group and I are looking to experiment with
> combining the output from Mappers on the same node or in the same rack. We
> found the idea at http://wiki.apache.org/hadoop/HadoopResearchProjects.
>
> Your general idea is sort of chalked out in Apache Tez (per-host/per-rack
> multi-level combiner trees, which is designed to be more flexible with its
> plumbing) -
> https://issues.apache.org/jira/browse/TEZ-145
>
> Cheers,
> Gopal
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity
> to which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified
> that any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender
> immediately and delete it from your system. Thank You.
>
> --
> CONFIDENTIALITY NOTICE
> NOTICE: This message is intended for the use of the individual or entity to
> which it is addressed and may contain information that is confidential,
> privileged and exempt from disclosure under applicable law. If the reader
> of this message is not the intended recipient, you are hereby notified that
> any printing, copying, dissemination, distribution, disclosure or
> forwarding of this communication is strictly prohibited. If you have
> received this communication in error, please contact the sender immediately
> and delete it from your system. Thank You.

--
- Tsuyoshi