Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Profiling Hadoop Job


Copy link to this message
-
Re: Profiling Hadoop Job
Hi Leo,

Thanks for pointing out the outdated README file.  Glad to tell you that we
do support the old API in the latest version. See here:

http://www.cs.duke.edu/starfish/previous.html

Welcome to join our mailing list and your questions will reach more of our
group members.

Jie

On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote:

> Hi Jie,
>
> According to the Starfish README, the hadoop programs must be written using
> the new Hadoop API. This is not my case (I am using MultipleInputs among
> other non-new API supported features). Is there any way around this?
> Thanks,
>
> -Leo
>
> On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote:
>
> > Hi Leonardo,
> >
> > You might want to try Starfish which supports the memory profiling as
> well
> > as cpu/disk/network profiling for the performance tuning.
> >
> > Jie
> > ------------------
> > Starfish is an intelligent performance tuning tool for Hadoop.
> > Homepage: www.cs.duke.edu/starfish/
> > Mailing list: http://groups.google.com/group/hadoop-starfish
> >
> >
> > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote:
> >
> > > Hello everyone,
> > >
> > > I have a Hadoop job that I run on several GBs of data that I am trying
> to
> > > optimize in order to reduce the memory consumption as well as improve
> the
> > > speed. I am following the steps outlined in Tom White's "Hadoop: The
> > > Definitive Guide" for profiling using HPROF (p161), by setting the
> > > following properties in the JobConf:
> > >
> > >        job.setProfileEnabled(true);
> > >
> > > job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6,"
> +
> > >                "force=n,thread=y,verbose=n,file=%s");
> > >        job.setProfileTaskRange(true, "0-2");
> > >        job.setProfileTaskRange(false, "0-2");
> > >
> > > I am trying to run this locally on a single pseudo-distributed install
> of
> > > hadoop (0.20.2) and it gives the following error:
> > >
> > > Exception in thread "main" java.io.FileNotFoundException:
> > > attempt_201203071311_0004_m_000000_0.profile (Permission denied)
> > >        at java.io.FileOutputStream.open(Native Method)
> > >        at java.io.FileOutputStream.<init>(FileOutputStream.java:194)
> > >        at java.io.FileOutputStream.<init>(FileOutputStream.java:84)
> > >        at
> > > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226)
> > >        at
> > >
> >
> org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302)
> > >        at
> org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251)
> > >        at
> > >
> > >
> >
> com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89)
> > >        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
> > >        at
> > >
> > >
> >
> com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94)
> > >        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >        at
> > >
> > >
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> > >        at
> > >
> > >
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> > >        at java.lang.reflect.Method.invoke(Method.java:597)
> > >        at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
> > >
> > > However, I can access these logs directly from the tasktracker's logs
> > > (through the web UI). For the sakes of  running this locally, I could
> > just
> > > ignore this error, however I want to be able to profile the job once
> > > deployed to our hadoop cluster and need to be able to automatically
> > > retrieve these logs. Do I need to change the permissions in HDFS to
> allow
> > > for this? Any ideas on how to fix this? Thanks in advance,
> > >
> > > Best,
> > > -Leo
> > >
> > > --
> > > Leo Urbina
> > > Massachusetts Institute of Technology
> > > Department of Electrical Engineering and Computer Science
> > > Department of Mathematics
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB