|
|
+
Jon Allen 2012-11-23, 12:02
-
Re: Hadoop 1.0.4 Performance ProblemJie Li 2012-12-20, 00:27
Hi Chris,
The standalone log analyzer was released in December and designed to be easier to use. Regarding the license, I think it's ok to use it in the commercial environment for the evaluation purpose, and your feedback would help us to improve it. Jie On Tue, Dec 18, 2012 at 1:02 AM, Chris Smith <[EMAIL PROTECTED]> wrote: > Jie, > > Recent was over 11 months ago. :-) > > Unfortunately the software licence requires that most of us 'negotiate' a > commerical use license before we trial the software in a commercial > environment: > http://www.cs.duke.edu/starfish/files/SOFTWARE_LICENSE_AGREEMENT.txt and as > clarified here: http://www.cs.duke.edu/starfish/previous.html > > Under that last URL was a note that you were soon to distribute the source > code under the Apache Software License. Last time I asked the reply was > that this would not happen. Perhaps it is time to update your web pages or > your license arrangements. :-) > > I like what I saw on my home 'cluster' but have not the time to sort out > licensing to trial this in a commercial environment. > > Chris > > > > > > On 14 December 2012 01:46, Jie Li <[EMAIL PROTECTED]> wrote: >> >> Hi Jon, >> >> Thanks for sharing these insights! Can't agree with you more! >> >> Recently we released a tool called Starfish Hadoop Log Analyzer for >> analyzing the job histories. I believe it can quickly point out this >> reduce problem you met! >> >> http://www.cs.duke.edu/starfish/release.html >> >> Jie >> >> On Wed, Nov 28, 2012 at 5:32 PM, Jon Allen <[EMAIL PROTECTED]> wrote: >> > Jie, >> > >> > Simple answer - I got lucky (though obviously there are thing you need >> > to >> > have in place to allow you to be lucky). >> > >> > Before running the upgrade I ran a set of tests to baseline the cluster >> > performance, e.g. terasort, gridmix and some operational jobs. Terasort >> > by >> > itself isn't very realistic as a cluster test but it's nice and simple >> > to >> > run and is good for regression testing things after a change. >> > >> > After the upgrade the intention was to run the same tests and show that >> > the >> > performance hadn't degraded (improved would have been nice but not worse >> > was >> > the minimum). When we ran the terasort we found that performance was >> > about >> > 50% worse - execution time had gone from 40 minutes to 60 minutes. As >> > I've >> > said, terasort doesn't provide a realistic view of operational >> > performance >> > but this showed that something major had changed and we needed to >> > understand >> > it before going further. So how to go about diagnosing this ... >> > >> > First rule - understand what you're trying to achieve. It's very easy >> > to >> > say performance isn't good enough but performance can always be better >> > so >> > you need to know what's realistic and at what point you're going to stop >> > tuning things. I had a previous baseline that I was trying to match so >> > I >> > knew what I was trying to achieve. >> > >> > Next thing to do is profile your job and identify where the problem is. >> > We >> > had the full job history from the before and after jobs and comparing >> > these >> > we saw that map performance was fairly consistent as were the reduce >> > sort >> > and reduce phases. The problem was with the shuffle, which had gone >> > from 20 >> > minutes pre-upgrade to 40 minutes afterwards. The important thing here >> > is >> > to make sure you've got as much information as possible. If we'd just >> > kept >> > the overall job time then there would have been a lot more areas to look >> > at >> > but knowing the problem was with shuffle allowed me to focus effort in >> > this >> > area. >> > >> > So what had changed in the shuffle that may have slowed things down. >> > The >> > first thing we thought of was that we'd moved from a tarball deployment >> > to >> > using the RPM so what effect might this have had on things. Our >> > operational >> > configuration compresses the map output and in the past we've had |