|
|
-
Hadoop tool-kit for monitoring
Mark question 2011-05-17, 20:01
Hi I need to use hadoop-tool-kit for monitoring. So I followed http://code.google.com/p/hadoop-toolkit/source/checkoutand applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2 and set a property *“mapred.performance.diagnose”* to true in * mapred-site.xml*. but I don't see the memory stuff information that it's supposed to be shown as http://code.google.com/p/hadoop-toolkit/wiki/HadoopPerformanceMonitoringI then installed hadoop-0.21.0 and only set the same property as above, but still don't see the requested monitor infos. ... What's wrong I'm doing ? I appreciate any thoughts, Mark
+
Mark question 2011-05-17, 20:01
-
Re: Hadoop tool-kit for monitoring
Allen Wittenauer 2011-05-17, 21:58
On May 17, 2011, at 1:01 PM, Mark question wrote: > Hi > > I need to use hadoop-tool-kit for monitoring. So I followed > http://code.google.com/p/hadoop-toolkit/source/checkout> > and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2 Looking at the code, be aware this is going to give incorrect results/suggestions for certain stats it generates when multiple jobs are running. It also seems to lack "the algorithm should be rewritten" and "the data was loaded incorrectly" suggestions, which is usually the proper answer for perf problems 80% of the time.
+
Allen Wittenauer 2011-05-17, 21:58
-
Re: Hadoop tool-kit for monitoring
Mark question 2011-05-17, 22:11
So what other memory consumption tools do you suggest? I don't want to do it manually and dump statistics into file because IO will affect performance too. Thanks, Mark On Tue, May 17, 2011 at 2:58 PM, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > On May 17, 2011, at 1:01 PM, Mark question wrote: > > > Hi > > > > I need to use hadoop-tool-kit for monitoring. So I followed > > http://code.google.com/p/hadoop-toolkit/source/checkout> > > > and applied the patch in my hadoop.20.2 directory as: patch -p0 < > patch.20.2 > > Looking at the code, be aware this is going to give incorrect > results/suggestions for certain stats it generates when multiple jobs are > running. > > It also seems to lack "the algorithm should be rewritten" and "the > data was loaded incorrectly" suggestions, which is usually the proper answer > for perf problems 80% of the time.
+
Mark question 2011-05-17, 22:11
-
Re: Hadoop tool-kit for monitoring
Allen Wittenauer 2011-05-17, 22:15
On May 17, 2011, at 3:11 PM, Mark question wrote:
> So what other memory consumption tools do you suggest? I don't want to do it > manually and dump statistics into file because IO will affect performance > too.
We watch memory with Ganglia. We also tune our systems such that a task will only take X amount. In other words, given an 8gb RAM:
1gb for the OS 1gb for the TT and DN 6gb for all tasks
if we assume each task will take max 1gb, then we end up with 3 maps and 3 reducers.
Keep in mind that the mem consumed is more than just JVM heap size.
+
Allen Wittenauer 2011-05-17, 22:15
-
Re: Hadoop tool-kit for monitoring
Konstantin Boudnik 2011-05-17, 22:16
Also, it seems like Ganglia would be very well complemented by Nagios to allow you to monitor an overall health of your cluster. -- Take care, Konstantin (Cos) Boudnik 2CAC 8312 4870 D885 8616 6115 220F 6980 1F27 E622
Disclaimer: Opinions expressed in this email are those of the author, and do not necessarily represent the views of any company the author might be affiliated with at the moment of writing.
On Tue, May 17, 2011 at 15:15, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > On May 17, 2011, at 3:11 PM, Mark question wrote: > >> So what other memory consumption tools do you suggest? I don't want to do it >> manually and dump statistics into file because IO will affect performance >> too. > > We watch memory with Ganglia. We also tune our systems such that a task will only take X amount. In other words, given an 8gb RAM: > > 1gb for the OS > 1gb for the TT and DN > 6gb for all tasks > > if we assume each task will take max 1gb, then we end up with 3 maps and 3 reducers. > > Keep in mind that the mem consumed is more than just JVM heap size.
+
Konstantin Boudnik 2011-05-17, 22:16
-
Re: Hadoop tool-kit for monitoring
Mark question 2011-05-17, 22:54
Thanks for the inputs, but I'm running on a university cluster, not my own and hence are the assumptions such as each task(mapper/reduer) will take 1 GB valid ?
So I guess to tune performance I should try running the job multiple times and rely on execution time as an indicator of success.
Thanks again, Mark
On Tue, May 17, 2011 at 3:16 PM, Konstantin Boudnik <[EMAIL PROTECTED]> wrote:
> Also, it seems like Ganglia would be very well complemented by Nagios > to allow you to monitor an overall health of your cluster. > -- > Take care, > Konstantin (Cos) Boudnik > 2CAC 8312 4870 D885 8616 6115 220F 6980 1F27 E622 > > Disclaimer: Opinions expressed in this email are those of the author, > and do not necessarily represent the views of any company the author > might be affiliated with at the moment of writing. > > On Tue, May 17, 2011 at 15:15, Allen Wittenauer <[EMAIL PROTECTED]> wrote: > > > > On May 17, 2011, at 3:11 PM, Mark question wrote: > > > >> So what other memory consumption tools do you suggest? I don't want to > do it > >> manually and dump statistics into file because IO will affect > performance > >> too. > > > > We watch memory with Ganglia. We also tune our systems such that > a task will only take X amount. In other words, given an 8gb RAM: > > > > 1gb for the OS > > 1gb for the TT and DN > > 6gb for all tasks > > > > if we assume each task will take max 1gb, then we end up with 3 > maps and 3 reducers. > > > > Keep in mind that the mem consumed is more than just JVM heap > size. >
+
Mark question 2011-05-17, 22:54
|
|