Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # dev >> sort phase in hadoop mapper


Copy link to this message
-
Re: sort phase in hadoop mapper
That makes sense, Samaneh.  I was thinking about it more coarsely.  As far
as I know, currently there is no way to skip the sort phase - you would
need to modify the code.

-Sandy
On Thu, Apr 18, 2013 at 3:42 PM, Samaneh Shokuhi
<[EMAIL PROTECTED]>wrote:

> Hi Sandy,
> As i understood  map task involves these phases.1) Map processing 2) spill
> buffer contents to disk 3) partitioning  4) sorting 5) merging spill files
> into single file
> MM maybe i am wrong but i thought  outputs are grouped in partitioning
> phase and after that it will be sorted in sort phase before sending to
> reducer. Is that what happens in mapper phase ?
>
> Regarding to your question ,actually I think sort phase is one of the time
> consuming phase in mapper , what i am trying to do is to know how much
> percentage  of mapper time is spent on sort phase and investigate if  it is
> possible to skip sort in some cases.For example if we have only one reducer
> is it possible to skip the sorting and just flush the data directly to the
> reducer ?
>
> Samaneh
>
>
>
> On Thu, Apr 18, 2013 at 8:46 PM, Sandy Ryza <[EMAIL PROTECTED]>
> wrote:
>
> > Hi Samaneh,
> >
> > If you want to see the map outputs post sort/shuffle, the easiest way is
> > probably to use an IdentityReducer and inspect the job.
> >
> > Can you be more specific on what you need to disable the sort phase for?
> >  Sorting is used in part to group map outputs and route them to the
> correct
> > reducer.
> >
> > -Sandy
> >
> >
> > On Thu, Apr 18, 2013 at 1:53 AM, Samaneh Shokuhi
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Hello All,
> > > I am doing some experiments with WordCount  example running on hadoop
> > > cluster. I have some questions :
> > >
> > > 1) How can i monitor the output from mapper before flushing to
> reducer? (
> > > Infact i want to see how the keys are sorted.)
> > >
> > > 2) In one of my experiments i need to disable the sort phase in Mapper
> > and
> > > send unsorted data to reducer. Is there any way to disable this sort in
> > > mapper ? or i need to modify hadoop to disable it ?
> > > As i undestood in MapTask.java  this functionality implemented.
> > > And ofcourse i dont want to set number of reducer to zero becouse i
> need
> > to
> > > have atleast one reducer.
> > >
> > > So any idea how to disable the  sort phase in mapper and monitor the
> > output
> > > ?
> > >
> > > Best,
> > > Samaneh
> > >
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB