Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Hadoop 1.0.4 Performance Problem


Copy link to this message
-
Re: Hadoop 1.0.4 Performance Problem
+1 the way jon elaborated it.
On Fri, Dec 21, 2012 at 6:36 AM, Todd Lipcon <[EMAIL PROTECTED]> wrote:

> Hi Jon,
>
> FYI, this issue in the fair scheduler was fixed by
> https://issues.apache.org/jira/browse/MAPREDUCE-2905 for 1.1.0.
> Though it is present again in MR2:
> https://issues.apache.org/jira/browse/MAPREDUCE-3268
>
> -Todd
>
> On Wed, Nov 28, 2012 at 2:32 PM, Jon Allen <[EMAIL PROTECTED]> wrote:
> > Jie,
> >
> > Simple answer - I got lucky (though obviously there are thing you need to
> > have in place to allow you to be lucky).
> >
> > Before running the upgrade I ran a set of tests to baseline the cluster
> > performance, e.g. terasort, gridmix and some operational jobs.  Terasort
> by
> > itself isn't very realistic as a cluster test but it's nice and simple to
> > run and is good for regression testing things after a change.
> >
> > After the upgrade the intention was to run the same tests and show that
> the
> > performance hadn't degraded (improved would have been nice but not worse
> was
> > the minimum).  When we ran the terasort we found that performance was
> about
> > 50% worse - execution time had gone from 40 minutes to 60 minutes.  As
> I've
> > said, terasort doesn't provide a realistic view of operational
> performance
> > but this showed that something major had changed and we needed to
> understand
> > it before going further.  So how to go about diagnosing this ...
> >
> > First rule - understand what you're trying to achieve.  It's very easy to
> > say performance isn't good enough but performance can always be better so
> > you need to know what's realistic and at what point you're going to stop
> > tuning things.  I had a previous baseline that I was trying to match so I
> > knew what I was trying to achieve.
> >
> > Next thing to do is profile your job and identify where the problem is.
>  We
> > had the full job history from the before and after jobs and comparing
> these
> > we saw that map performance was fairly consistent as were the reduce sort
> > and reduce phases.  The problem was with the shuffle, which had gone
> from 20
> > minutes pre-upgrade to 40 minutes afterwards.  The important thing here
> is
> > to make sure you've got as much information as possible.  If we'd just
> kept
> > the overall job time then there would have been a lot more areas to look
> at
> > but knowing the problem was with shuffle allowed me to focus effort in
> this
> > area.
> >
> > So what had changed in the shuffle that may have slowed things down.  The
> > first thing we thought of was that we'd moved from a tarball deployment
> to
> > using the RPM so what effect might this have had on things.  Our
> operational
> > configuration compresses the map output and in the past we've had
> problems
> > with Java compression libraries being used rather than native ones and
> this
> > has affected performance.  We knew the RPM deployment had moved the
> native
> > library so spent some time confirming to ourselves that these were being
> > used correctly (but this turned out to not be the problem).  We then
> spent
> > time doing some process and server profiling - using dstat to look at the
> > server bottlenecks and jstack/jmap to check what the task tracker and
> reduce
> > processes were doing.  Although not directly relevant to this particular
> > problem doing this was useful just to get my head around what Hadoop is
> > doing at various points of the process.
> >
> > The next bit was one place where I got lucky - I happened to be logged
> onto
> > one of the worker nodes when a test job was running and I noticed that
> there
> > weren't any reduce tasks running on the server.  This was odd as we'd
> > submitted more reducers than we have servers so I'd expected at least one
> > task to be running on each server.  Checking the job tracker log file it
> > turned out that since the upgrade the job tracker had been submitting
> reduce
> > tasks to only 10% of the available nodes.  A different 10% each time the