Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Profiling Hadoop Job


Copy link to this message
-
Re: Profiling Hadoop Job
Hi Leonardo,

You might want to try Starfish which supports the memory profiling as well
as cpu/disk/network profiling for the performance tuning.

Jie
------------------
Starfish is an intelligent performance tuning tool for Hadoop.
Homepage: www.cs.duke.edu/starfish/
Mailing list: http://groups.google.com/group/hadoop-starfish
On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote:

> Hello everyone,
>
> I have a Hadoop job that I run on several GBs of data that I am trying to
> optimize in order to reduce the memory consumption as well as improve the
> speed. I am following the steps outlined in Tom White's "Hadoop: The
> Definitive Guide" for profiling using HPROF (p161), by setting the
> following properties in the JobConf:
>
>        job.setProfileEnabled(true);
>
> job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," +
>                "force=n,thread=y,verbose=n,file=%s");
>        job.setProfileTaskRange(true, "0-2");
>        job.setProfileTaskRange(false, "0-2");
>
> I am trying to run this locally on a single pseudo-distributed install of
> hadoop (0.20.2) and it gives the following error:
>
> Exception in thread "main" java.io.FileNotFoundException:
> attempt_201203071311_0004_m_000000_0.profile (Permission denied)
>        at java.io.FileOutputStream.open(Native Method)
>        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
>        at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
>        at
> org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
>        at
> org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
>        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
>        at
>
> com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
>        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
>        at
>
> com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
>        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>        at
>
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>        at
>
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>        at java.lang.reflect.Method.invoke(Method.java:597)
>        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
>
> However, I can access these logs directly from the tasktracker's logs
> (through the web UI). For the sakes of  running this locally, I could just
> ignore this error, however I want to be able to profile the job once
> deployed to our hadoop cluster and need to be able to automatically
> retrieve these logs. Do I need to change the permissions in HDFS to allow
> for this? Any ideas on how to fix this? Thanks in advance,
>
> Best,
> -Leo
>
> --
> Leo Urbina
> Massachusetts Institute of Technology
> Department of Electrical Engineering and Computer Science
> Department of Mathematics
> [EMAIL PROTECTED]
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB