Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce, mail # dev - sort phase in hadoop mapper


Copy link to this message
-
Re: sort phase in hadoop mapper
Sandy Ryza 2013-04-19, 01:29
That makes sense, Samaneh.  I was thinking about it more coarsely.  As far
as I know, currently there is no way to skip the sort phase - you would
need to modify the code.

-Sandy
On Thu, Apr 18, 2013 at 3:42 PM, Samaneh Shokuhi
<[EMAIL PROTECTED]>wrote:

> Hi Sandy,
> As i understood  map task involves these phases.1) Map processing 2) spill
> buffer contents to disk 3) partitioning  4) sorting 5) merging spill files
> into single file
> MM maybe i am wrong but i thought  outputs are grouped in partitioning
> phase and after that it will be sorted in sort phase before sending to
> reducer. Is that what happens in mapper phase ?
>
> Regarding to your question ,actually I think sort phase is one of the time
> consuming phase in mapper , what i am trying to do is to know how much
> percentage  of mapper time is spent on sort phase and investigate if  it is
> possible to skip sort in some cases.For example if we have only one reducer
> is it possible to skip the sorting and just flush the data directly to the
> reducer ?
>
> Samaneh
>
>
>
> On Thu, Apr 18, 2013 at 8:46 PM, Sandy Ryza <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Samaneh,
> >
> > If you want to see the map outputs post sort/shuffle, the easiest way is
> > probably to use an IdentityReducer and inspect the job.
> >
> > Can you be more specific on what you need to disable the sort phase for?
> >  Sorting is used in part to group map outputs and route them to the
> correct
> > reducer.
> >
> > -Sandy
> >
> >
> > On Thu, Apr 18, 2013 at 1:53 AM, Samaneh Shokuhi
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Hello All,
> > > I am doing some experiments with WordCount  example running on hadoop
> > > cluster. I have some questions :
> > >
> > > 1) How can i monitor the output from mapper before flushing to
> reducer? (
> > > Infact i want to see how the keys are sorted.)
> > >
> > > 2) In one of my experiments i need to disable the sort phase in Mapper
> > and
> > > send unsorted data to reducer. Is there any way to disable this sort in
> > > mapper ? or i need to modify hadoop to disable it ?
> > > As i undestood in MapTask.java  this functionality implemented.
> > > And ofcourse i dont want to set number of reducer to zero becouse i
> need
> > to
> > > have atleast one reducer.
> > >
> > > So any idea how to disable the  sort phase in mapper and monitor the
> > output
> > > ?
> > >
> > > Best,
> > > Samaneh
> > >
> >
>