Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Broad question on sorting of mapper outputs.


+
Jay Vyas 2012-10-20, 03:19
Copy link to this message
-
Re: Broad question on sorting of mapper outputs.
Hi Jay,

AFAIK, when the MR does not have a reducer phase(i.e. no. of reducer=0)
then the output from Mapper is not sorted.

HTH,
Anil

On Fri, Oct 19, 2012 at 8:19 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:

> IS there any documentation on the internals of the shuffle and sort phase?
> The elephant book seems to be the best source, but it appears to only
> lightly touch upon the "magic" part (i.e. the distributed merge sorting and
> mapper spilling).
>
> Also... What is the rationale behind the sortedness of mapper outputs?  Is
> the reason to optimize the streaming of mapper values to reducers?  In
> simple scenarios, i.e. when there is no reducing to be done, it seems that
> we may not care to have sorted mapper outputs : a random merge of all
> spilled records would be sufficient.
>
> I've noticed that the Shuffle and Sort classes in hadoop have almost no
> comments and appear to simply wrap other classes.
>
> --
> Jay Vyas
> http://jayunit100.blogspot.com
>

--
Thanks & Regards,
Anil Gupta
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB