Jay Vyas 2012-10-20, 03:19
AFAIK, when the MR does not have a reducer phase(i.e. no. of reducer=0)
then the output from Mapper is not sorted.
On Fri, Oct 19, 2012 at 8:19 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
> IS there any documentation on the internals of the shuffle and sort phase?
> The elephant book seems to be the best source, but it appears to only
> lightly touch upon the "magic" part (i.e. the distributed merge sorting and
> mapper spilling).
> Also... What is the rationale behind the sortedness of mapper outputs? Is
> the reason to optimize the streaming of mapper values to reducers? In
> simple scenarios, i.e. when there is no reducing to be done, it seems that
> we may not care to have sorted mapper outputs : a random merge of all
> spilled records would be sufficient.
> I've noticed that the Shuffle and Sort classes in hadoop have almost no
> comments and appear to simply wrap other classes.
> Jay Vyas
Thanks & Regards,