-Re: sort phase in hadoop mapper
Sandy Ryza 2013-04-19, 01:29
That makes sense, Samaneh. I was thinking about it more coarsely. As far
as I know, currently there is no way to skip the sort phase - you would
need to modify the code.
On Thu, Apr 18, 2013 at 3:42 PM, Samaneh Shokuhi
> Hi Sandy,
> As i understood map task involves these phases.1) Map processing 2) spill
> buffer contents to disk 3) partitioning 4) sorting 5) merging spill files
> into single file
> MM maybe i am wrong but i thought outputs are grouped in partitioning
> phase and after that it will be sorted in sort phase before sending to
> reducer. Is that what happens in mapper phase ?
> Regarding to your question ,actually I think sort phase is one of the time
> consuming phase in mapper , what i am trying to do is to know how much
> percentage of mapper time is spent on sort phase and investigate if it is
> possible to skip sort in some cases.For example if we have only one reducer
> is it possible to skip the sorting and just flush the data directly to the
> reducer ?
> On Thu, Apr 18, 2013 at 8:46 PM, Sandy Ryza <[EMAIL PROTECTED]>
> > Hi Samaneh,
> > If you want to see the map outputs post sort/shuffle, the easiest way is
> > probably to use an IdentityReducer and inspect the job.
> > Can you be more specific on what you need to disable the sort phase for?
> > Sorting is used in part to group map outputs and route them to the
> > reducer.
> > -Sandy
> > On Thu, Apr 18, 2013 at 1:53 AM, Samaneh Shokuhi
> > <[EMAIL PROTECTED]>wrote:
> > > Hello All,
> > > I am doing some experiments with WordCount example running on hadoop
> > > cluster. I have some questions :
> > >
> > > 1) How can i monitor the output from mapper before flushing to
> reducer? (
> > > Infact i want to see how the keys are sorted.)
> > >
> > > 2) In one of my experiments i need to disable the sort phase in Mapper
> > and
> > > send unsorted data to reducer. Is there any way to disable this sort in
> > > mapper ? or i need to modify hadoop to disable it ?
> > > As i undestood in MapTask.java this functionality implemented.
> > > And ofcourse i dont want to set number of reducer to zero becouse i
> > to
> > > have atleast one reducer.
> > >
> > > So any idea how to disable the sort phase in mapper and monitor the
> > output
> > > ?
> > >
> > > Best,
> > > Samaneh
> > >