Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop >> mail # user >> Broad question on sorting of mapper outputs.

Jay Vyas 2012-10-20, 03:19
Copy link to this message
Re: Broad question on sorting of mapper outputs.
Hi Jay,

AFAIK, when the MR does not have a reducer phase(i.e. no. of reducer=0)
then the output from Mapper is not sorted.


On Fri, Oct 19, 2012 at 8:19 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:

> IS there any documentation on the internals of the shuffle and sort phase?
> The elephant book seems to be the best source, but it appears to only
> lightly touch upon the "magic" part (i.e. the distributed merge sorting and
> mapper spilling).
> Also... What is the rationale behind the sortedness of mapper outputs?  Is
> the reason to optimize the streaming of mapper values to reducers?  In
> simple scenarios, i.e. when there is no reducing to be done, it seems that
> we may not care to have sorted mapper outputs : a random merge of all
> spilled records would be sufficient.
> I've noticed that the Shuffle and Sort classes in hadoop have almost no
> comments and appear to simply wrap other classes.
> --
> Jay Vyas
> http://jayunit100.blogspot.com

Thanks & Regards,
Anil Gupta