Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Profiling Hadoop Job


Copy link to this message
-
Re: Profiling Hadoop Job
Hi Leonardo,

You might want to try Starfish which supports the memory profiling as well
as cpu/disk/network profiling for the performance tuning.

Jie
------------------
Starfish is an intelligent performance tuning tool for Hadoop.
Homepage: www.cs.duke.edu/starfish/
Mailing list: http://groups.google.com/group/hadoop-starfish
On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote:

> Hello everyone,
>
> I have a Hadoop job that I run on several GBs of data that I am trying to
> optimize in order to reduce the memory consumption as well as improve the
> speed. I am following the steps outlined in Tom White's "Hadoop: The
> Definitive Guide" for profiling using HPROF (p161), by setting the
> following properties in the JobConf:
>
>        job.setProfileEnabled(true);
>
> job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," +
>                "force=n,thread=y,verbose=n,file=%s");
>        job.setProfileTaskRange(true, "0-2");
>        job.setProfileTaskRange(false, "0-2");
>
> I am trying to run this locally on a single pseudo-distributed install of
> hadoop (0.20.2) and it gives the following error:
>
> Exception in thread "main" java.io.FileNotFoundException:
> attempt_201203071311_0004_m_000000_0.profile (Permission denied)
>        at java.io.FileOutputStream.open(Native Method)
>        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
>        at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
>        at
> org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
>        at
> org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
>        at
>
> com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
>
> com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> However, I can access these logs directly from the tasktracker's logs
> (through the web UI). For the sakes of  running this locally, I could just
> ignore this error, however I want to be able to profile the job once
> deployed to our hadoop cluster and need to be able to automatically
> retrieve these logs. Do I need to change the permissions in HDFS to allow
> for this? Any ideas on how to fix this? Thanks in advance,
>
> Best,
> -Leo
>
> --
> Leo Urbina
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Department of Mathematics
> [EMAIL PROTECTED]
>