|
Leonardo Urbina
2012-03-07, 19:36
Jie Li
2012-03-07, 20:19
Leonardo Urbina
2012-03-07, 20:37
Jie Li
2012-03-07, 20:47
Leonardo Urbina
2012-03-07, 20:52
Leonardo Urbina
2012-03-08, 23:13
Mohit Anchlia
2012-03-09, 00:10
Vinod Kumar Vavilapalli
2012-03-09, 00:37
Leonardo Urbina
2012-04-18, 20:03
|
-
Profiling Hadoop JobLeonardo Urbina 2012-03-07, 19:36
Hello everyone,
I have a Hadoop job that I run on several GBs of data that I am trying to optimize in order to reduce the memory consumption as well as improve the speed. I am following the steps outlined in Tom White's "Hadoop: The Definitive Guide" for profiling using HPROF (p161), by setting the following properties in the JobConf: job.setProfileEnabled(true); job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," + "force=n,thread=y,verbose=n,file=%s"); job.setProfileTaskRange(true, "0-2"); job.setProfileTaskRange(false, "0-2"); I am trying to run this locally on a single pseudo-distributed install of hadoop (0.20.2) and it gives the following error: Exception in thread "main" java.io.FileNotFoundException: attempt_201203071311_0004_m_000000_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:194) at java.io.FileOutputStream.<init>(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) However, I can access these logs directly from the tasktracker's logs (through the web UI). For the sakes of running this locally, I could just ignore this error, however I want to be able to profile the job once deployed to our hadoop cluster and need to be able to automatically retrieve these logs. Do I need to change the permissions in HDFS to allow for this? Any ideas on how to fix this? Thanks in advance, Best, -Leo -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics [EMAIL PROTECTED]
-
Re: Profiling Hadoop JobJie Li 2012-03-07, 20:19
Hi Leonardo,
You might want to try Starfish which supports the memory profiling as well as cpu/disk/network profiling for the performance tuning. Jie ------------------ Starfish is an intelligent performance tuning tool for Hadoop. Homepage: www.cs.duke.edu/starfish/ Mailing list: http://groups.google.com/group/hadoop-starfish On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > Hello everyone, > > I have a Hadoop job that I run on several GBs of data that I am trying to > optimize in order to reduce the memory consumption as well as improve the > speed. I am following the steps outlined in Tom White's "Hadoop: The > Definitive Guide" for profiling using HPROF (p161), by setting the > following properties in the JobConf: > > job.setProfileEnabled(true); > > job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," + > "force=n,thread=y,verbose=n,file=%s"); > job.setProfileTaskRange(true, "0-2"); > job.setProfileTaskRange(false, "0-2"); > > I am trying to run this locally on a single pseudo-distributed install of > hadoop (0.20.2) and it gives the following error: > > Exception in thread "main" java.io.FileNotFoundException: > attempt_201203071311_0004_m_000000_0.profile (Permission denied) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > at > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) > at > org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) > at > > com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > > com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > However, I can access these logs directly from the tasktracker's logs > (through the web UI). For the sakes of running this locally, I could just > ignore this error, however I want to be able to profile the job once > deployed to our hadoop cluster and need to be able to automatically > retrieve these logs. Do I need to change the permissions in HDFS to allow > for this? Any ideas on how to fix this? Thanks in advance, > > Best, > -Leo > > -- > Leo Urbina > Massachusetts Institute of Technology > Department of Electrical Engineering and Computer Science > Department of Mathematics > [EMAIL PROTECTED] >
-
Re: Profiling Hadoop JobLeonardo Urbina 2012-03-07, 20:37
Hi Jie,
According to the Starfish README, the hadoop programs must be written using the new Hadoop API. This is not my case (I am using MultipleInputs among other non-new API supported features). Is there any way around this? Thanks, -Leo On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote: > Hi Leonardo, > > You might want to try Starfish which supports the memory profiling as well > as cpu/disk/network profiling for the performance tuning. > > Jie > ------------------ > Starfish is an intelligent performance tuning tool for Hadoop. > Homepage: www.cs.duke.edu/starfish/ > Mailing list: http://groups.google.com/group/hadoop-starfish > > > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > > > Hello everyone, > > > > I have a Hadoop job that I run on several GBs of data that I am trying to > > optimize in order to reduce the memory consumption as well as improve the > > speed. I am following the steps outlined in Tom White's "Hadoop: The > > Definitive Guide" for profiling using HPROF (p161), by setting the > > following properties in the JobConf: > > > > job.setProfileEnabled(true); > > > > job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," + > > "force=n,thread=y,verbose=n,file=%s"); > > job.setProfileTaskRange(true, "0-2"); > > job.setProfileTaskRange(false, "0-2"); > > > > I am trying to run this locally on a single pseudo-distributed install of > > hadoop (0.20.2) and it gives the following error: > > > > Exception in thread "main" java.io.FileNotFoundException: > > attempt_201203071311_0004_m_000000_0.profile (Permission denied) > > at java.io.FileOutputStream.open(Native Method) > > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > > at > > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) > > at > > > org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) > > at > > > > > com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > > > > com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > However, I can access these logs directly from the tasktracker's logs > > (through the web UI). For the sakes of running this locally, I could > just > > ignore this error, however I want to be able to profile the job once > > deployed to our hadoop cluster and need to be able to automatically > > retrieve these logs. Do I need to change the permissions in HDFS to allow > > for this? Any ideas on how to fix this? Thanks in advance, > > > > Best, > > -Leo > > > > -- > > Leo Urbina > > Massachusetts Institute of Technology > > Department of Electrical Engineering and Computer Science > > Department of Mathematics > > [EMAIL PROTECTED] > > > -- Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics [EMAIL PROTECTED]
-
Re: Profiling Hadoop JobJie Li 2012-03-07, 20:47
Hi Leo,
Thanks for pointing out the outdated README file. Glad to tell you that we do support the old API in the latest version. See here: http://www.cs.duke.edu/starfish/previous.html Welcome to join our mailing list and your questions will reach more of our group members. Jie On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > Hi Jie, > > According to the Starfish README, the hadoop programs must be written using > the new Hadoop API. This is not my case (I am using MultipleInputs among > other non-new API supported features). Is there any way around this? > Thanks, > > -Leo > > On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote: > > > Hi Leonardo, > > > > You might want to try Starfish which supports the memory profiling as > well > > as cpu/disk/network profiling for the performance tuning. > > > > Jie > > ------------------ > > Starfish is an intelligent performance tuning tool for Hadoop. > > Homepage: www.cs.duke.edu/starfish/ > > Mailing list: http://groups.google.com/group/hadoop-starfish > > > > > > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > > > > > Hello everyone, > > > > > > I have a Hadoop job that I run on several GBs of data that I am trying > to > > > optimize in order to reduce the memory consumption as well as improve > the > > > speed. I am following the steps outlined in Tom White's "Hadoop: The > > > Definitive Guide" for profiling using HPROF (p161), by setting the > > > following properties in the JobConf: > > > > > > job.setProfileEnabled(true); > > > > > > job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," > + > > > "force=n,thread=y,verbose=n,file=%s"); > > > job.setProfileTaskRange(true, "0-2"); > > > job.setProfileTaskRange(false, "0-2"); > > > > > > I am trying to run this locally on a single pseudo-distributed install > of > > > hadoop (0.20.2) and it gives the following error: > > > > > > Exception in thread "main" java.io.FileNotFoundException: > > > attempt_201203071311_0004_m_000000_0.profile (Permission denied) > > > at java.io.FileOutputStream.open(Native Method) > > > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > > > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > > > at > > > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) > > > at > > > > > > org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) > > > at > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) > > > at > > > > > > > > > com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > > at > > > > > > > > > com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > at > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > at > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > > However, I can access these logs directly from the tasktracker's logs > > > (through the web UI). For the sakes of running this locally, I could > > just > > > ignore this error, however I want to be able to profile the job once > > > deployed to our hadoop cluster and need to be able to automatically > > > retrieve these logs. Do I need to change the permissions in HDFS to > allow > > > for this? Any ideas on how to fix this? Thanks in advance, > > > > > > Best, > > > -Leo > > > > > > -- > > > Leo Urbina > > > Massachusetts Institute of Technology > > > Department of Electrical Engineering and Computer Science > > > Department of Mathematics
-
Re: Profiling Hadoop JobLeonardo Urbina 2012-03-07, 20:52
Thanks,
-Leo On Wed, Mar 7, 2012 at 3:47 PM, Jie Li <[EMAIL PROTECTED]> wrote: > Hi Leo, > > Thanks for pointing out the outdated README file. Glad to tell you that we > do support the old API in the latest version. See here: > > http://www.cs.duke.edu/starfish/previous.html > > Welcome to join our mailing list and your questions will reach more of our > group members. > > Jie > > On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > > > Hi Jie, > > > > According to the Starfish README, the hadoop programs must be written > using > > the new Hadoop API. This is not my case (I am using MultipleInputs among > > other non-new API supported features). Is there any way around this? > > Thanks, > > > > -Leo > > > > On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote: > > > > > Hi Leonardo, > > > > > > You might want to try Starfish which supports the memory profiling as > > well > > > as cpu/disk/network profiling for the performance tuning. > > > > > > Jie > > > ------------------ > > > Starfish is an intelligent performance tuning tool for Hadoop. > > > Homepage: www.cs.duke.edu/starfish/ > > > Mailing list: http://groups.google.com/group/hadoop-starfish > > > > > > > > > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> > wrote: > > > > > > > Hello everyone, > > > > > > > > I have a Hadoop job that I run on several GBs of data that I am > trying > > to > > > > optimize in order to reduce the memory consumption as well as improve > > the > > > > speed. I am following the steps outlined in Tom White's "Hadoop: The > > > > Definitive Guide" for profiling using HPROF (p161), by setting the > > > > following properties in the JobConf: > > > > > > > > job.setProfileEnabled(true); > > > > > > > > > job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," > > + > > > > "force=n,thread=y,verbose=n,file=%s"); > > > > job.setProfileTaskRange(true, "0-2"); > > > > job.setProfileTaskRange(false, "0-2"); > > > > > > > > I am trying to run this locally on a single pseudo-distributed > install > > of > > > > hadoop (0.20.2) and it gives the following error: > > > > > > > > Exception in thread "main" java.io.FileNotFoundException: > > > > attempt_201203071311_0004_m_000000_0.profile (Permission denied) > > > > at java.io.FileOutputStream.open(Native Method) > > > > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > > > > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > > > > at > > > > > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) > > > > at > > > > > > > > > > org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) > > > > at > > org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) > > > > at > > > > > > > > > > > > > > com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) > > > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > > > at > > > > > > > > > > > > > > com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) > > > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > > > at > > > > > > > > > > > > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > > > at > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > at java.lang.reflect.Method.invoke(Method.java:597) > > > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > > > > > However, I can access these logs directly from the tasktracker's logs > > > > (through the web UI). For the sakes of running this locally, I could > > > just > > > > ignore this error, however I want to be able to profile the job once > > > > deployed to our hadoop cluster and need to be able to automatically > > > > retrieve these logs. Do I need to change the permissions in HDFS to Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics [EMAIL PROTECTED]
-
Re: Profiling Hadoop JobLeonardo Urbina 2012-03-08, 23:13
Does anyone have any idea how to solve this problem? Regardless of whether
I'm using plain HPROF or profiling through Starfish, I am getting the same error: Exception in thread "main" java.io.FileNotFoundException: attempt_201203071311_0004_m_ 000000_0.profile (Permission denied) at java.io.FileOutputStream.open(Native Method) at java.io.FileOutputStream.<init>(FileOutputStream.java:194) at java.io.FileOutputStream.<init>(FileOutputStream.java:84) at org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) at org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) at com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.RunJar.main(RunJar.java:156) But I can't find what permissions to change to fix this issue. Any ideas? Thanks in advance, Best, -Leo On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > Thanks, > -Leo > > > On Wed, Mar 7, 2012 at 3:47 PM, Jie Li <[EMAIL PROTECTED]> wrote: > >> Hi Leo, >> >> Thanks for pointing out the outdated README file. Glad to tell you that >> we >> do support the old API in the latest version. See here: >> >> http://www.cs.duke.edu/starfish/previous.html >> >> Welcome to join our mailing list and your questions will reach more of our >> group members. >> >> Jie >> >> On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: >> >> > Hi Jie, >> > >> > According to the Starfish README, the hadoop programs must be written >> using >> > the new Hadoop API. This is not my case (I am using MultipleInputs among >> > other non-new API supported features). Is there any way around this? >> > Thanks, >> > >> > -Leo >> > >> > On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote: >> > >> > > Hi Leonardo, >> > > >> > > You might want to try Starfish which supports the memory profiling as >> > well >> > > as cpu/disk/network profiling for the performance tuning. >> > > >> > > Jie >> > > ------------------ >> > > Starfish is an intelligent performance tuning tool for Hadoop. >> > > Homepage: www.cs.duke.edu/starfish/ >> > > Mailing list: http://groups.google.com/group/hadoop-starfish >> > > >> > > >> > > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> >> wrote: >> > > >> > > > Hello everyone, >> > > > >> > > > I have a Hadoop job that I run on several GBs of data that I am >> trying >> > to >> > > > optimize in order to reduce the memory consumption as well as >> improve >> > the >> > > > speed. I am following the steps outlined in Tom White's "Hadoop: The >> > > > Definitive Guide" for profiling using HPROF (p161), by setting the >> > > > following properties in the JobConf: >> > > > >> > > > job.setProfileEnabled(true); >> > > > >> > > > >> job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," >> > + >> > > > "force=n,thread=y,verbose=n,file=%s"); >> > > > job.setProfileTaskRange(true, "0-2"); >> > > > job.setProfileTaskRange(false, "0-2"); >> > > > >> > > > I am trying to run this locally on a single pseudo-distributed >> install >> > of >> > > > hadoop (0.20.2) and it gives the following error: >> > > > >> > > > Exception in thread "main" java.io.FileNotFoundException: >> > > > attempt_201203071311_0004_m_000000_0.profile (Permission denied) >> > > > at java.io.FileOutputStream.open(Native Method) Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics [EMAIL PROTECTED]
-
Re: Profiling Hadoop JobMohit Anchlia 2012-03-09, 00:10
Can you check which user you are running this process as and compare it
with the ownership on the directory? On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > Does anyone have any idea how to solve this problem? Regardless of whether > I'm using plain HPROF or profiling through Starfish, I am getting the same > error: > > Exception in thread "main" java.io.FileNotFoundException: > attempt_201203071311_0004_m_ > 000000_0.profile (Permission denied) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > at > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) > at > org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) > at > > com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > > com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > But I can't find what permissions to change to fix this issue. Any ideas? > Thanks in advance, > > Best, > -Leo > > > On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > > > Thanks, > > -Leo > > > > > > On Wed, Mar 7, 2012 at 3:47 PM, Jie Li <[EMAIL PROTECTED]> wrote: > > > >> Hi Leo, > >> > >> Thanks for pointing out the outdated README file. Glad to tell you that > >> we > >> do support the old API in the latest version. See here: > >> > >> http://www.cs.duke.edu/starfish/previous.html > >> > >> Welcome to join our mailing list and your questions will reach more of > our > >> group members. > >> > >> Jie > >> > >> On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <[EMAIL PROTECTED]> > wrote: > >> > >> > Hi Jie, > >> > > >> > According to the Starfish README, the hadoop programs must be written > >> using > >> > the new Hadoop API. This is not my case (I am using MultipleInputs > among > >> > other non-new API supported features). Is there any way around this? > >> > Thanks, > >> > > >> > -Leo > >> > > >> > On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote: > >> > > >> > > Hi Leonardo, > >> > > > >> > > You might want to try Starfish which supports the memory profiling > as > >> > well > >> > > as cpu/disk/network profiling for the performance tuning. > >> > > > >> > > Jie > >> > > ------------------ > >> > > Starfish is an intelligent performance tuning tool for Hadoop. > >> > > Homepage: www.cs.duke.edu/starfish/ > >> > > Mailing list: http://groups.google.com/group/hadoop-starfish > >> > > > >> > > > >> > > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> > >> wrote: > >> > > > >> > > > Hello everyone, > >> > > > > >> > > > I have a Hadoop job that I run on several GBs of data that I am > >> trying > >> > to > >> > > > optimize in order to reduce the memory consumption as well as > >> improve > >> > the > >> > > > speed. I am following the steps outlined in Tom White's "Hadoop: > The > >> > > > Definitive Guide" for profiling using HPROF (p161), by setting the > >> > > > following properties in the JobConf: > >> > > > > >> > > > job.setProfileEnabled(true); > >> > > > > >> > > > > >> job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," > >> > + > >> > > > "force=n,thread=y,verbose=n,file=%s"); > >> > > > job.setProfileTaskRange(true, "0-2"); > >> > > > job.setProfileTaskRange(false, "0-2");
-
Re: Profiling Hadoop JobVinod Kumar Vavilapalli 2012-03-09, 00:37
The JobClient is trying to download the profile output to the local
directory. It seems like you don't have write permissions in the current working directory where you are running the JobClient. Please check that. HTH. +Vinod Hortonworks Inc. http://hortonworks.com/ On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > Does anyone have any idea how to solve this problem? Regardless of whether > I'm using plain HPROF or profiling through Starfish, I am getting the same > error: > > Exception in thread "main" java.io.FileNotFoundException: > attempt_201203071311_0004_m_ > 000000_0.profile (Permission denied) > at java.io.FileOutputStream.open(Native Method) > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > at > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) > at > org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) > at > com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > at > com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > But I can't find what permissions to change to fix this issue. Any ideas? > Thanks in advance, > > Best, > -Leo > > > On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > >> Thanks, >> -Leo >> >> >> On Wed, Mar 7, 2012 at 3:47 PM, Jie Li <[EMAIL PROTECTED]> wrote: >> >>> Hi Leo, >>> >>> Thanks for pointing out the outdated README file. Glad to tell you that >>> we >>> do support the old API in the latest version. See here: >>> >>> http://www.cs.duke.edu/starfish/previous.html >>> >>> Welcome to join our mailing list and your questions will reach more of our >>> group members. >>> >>> Jie >>> >>> On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: >>> >>> > Hi Jie, >>> > >>> > According to the Starfish README, the hadoop programs must be written >>> using >>> > the new Hadoop API. This is not my case (I am using MultipleInputs among >>> > other non-new API supported features). Is there any way around this? >>> > Thanks, >>> > >>> > -Leo >>> > >>> > On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote: >>> > >>> > > Hi Leonardo, >>> > > >>> > > You might want to try Starfish which supports the memory profiling as >>> > well >>> > > as cpu/disk/network profiling for the performance tuning. >>> > > >>> > > Jie >>> > > ------------------ >>> > > Starfish is an intelligent performance tuning tool for Hadoop. >>> > > Homepage: www.cs.duke.edu/starfish/ >>> > > Mailing list: http://groups.google.com/group/hadoop-starfish >>> > > >>> > > >>> > > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> >>> wrote: >>> > > >>> > > > Hello everyone, >>> > > > >>> > > > I have a Hadoop job that I run on several GBs of data that I am >>> trying >>> > to >>> > > > optimize in order to reduce the memory consumption as well as >>> improve >>> > the >>> > > > speed. I am following the steps outlined in Tom White's "Hadoop: The >>> > > > Definitive Guide" for profiling using HPROF (p161), by setting the >>> > > > following properties in the JobConf: >>> > > > >>> > > > job.setProfileEnabled(true); >>> > > > >>> > > > >>> job.setProfileParams("-agentlib:hprof=cpu=samples,heap=sites,depth=6," >>> > + >>> > > > "force=n,thread=y,verbose=n,file=%s"); >>> > > > job.setProfileTaskRange(true, "0-2");
-
Re: Profiling Hadoop JobLeonardo Urbina 2012-04-18, 20:03
Sorry it took so long to respond, however that did solve it. Thanks!
On Thu, Mar 8, 2012 at 7:37 PM, Vinod Kumar Vavilapalli < [EMAIL PROTECTED]> wrote: > The JobClient is trying to download the profile output to the local > directory. It seems like you don't have write permissions in the > current working directory where you are running the JobClient. Please > check that. > > HTH. > > +Vinod > Hortonworks Inc. > http://hortonworks.com/ > > > On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > > Does anyone have any idea how to solve this problem? Regardless of > whether > > I'm using plain HPROF or profiling through Starfish, I am getting the > same > > error: > > > > Exception in thread "main" java.io.FileNotFoundException: > > attempt_201203071311_0004_m_ > > 000000_0.profile (Permission denied) > > at java.io.FileOutputStream.open(Native Method) > > at java.io.FileOutputStream.<init>(FileOutputStream.java:194) > > at java.io.FileOutputStream.<init>(FileOutputStream.java:84) > > at > > org.apache.hadoop.mapred.JobClient.downloadProfile(JobClient.java:1226) > > at > > > org.apache.hadoop.mapred.JobClient.monitorAndPrintJob(JobClient.java:1302) > > at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1251) > > at > > > com.BitSight.hadoopAggregator.AggregatorDriver.run(AggregatorDriver.java:89) > > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) > > at > > > com.BitSight.hadoopAggregator.AggregatorDriver.main(AggregatorDriver.java:94) > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at org.apache.hadoop.util.RunJar.main(RunJar.java:156) > > > > But I can't find what permissions to change to fix this issue. Any ideas? > > Thanks in advance, > > > > Best, > > -Leo > > > > > > On Wed, Mar 7, 2012 at 3:52 PM, Leonardo Urbina <[EMAIL PROTECTED]> wrote: > > > >> Thanks, > >> -Leo > >> > >> > >> On Wed, Mar 7, 2012 at 3:47 PM, Jie Li <[EMAIL PROTECTED]> wrote: > >> > >>> Hi Leo, > >>> > >>> Thanks for pointing out the outdated README file. Glad to tell you > that > >>> we > >>> do support the old API in the latest version. See here: > >>> > >>> http://www.cs.duke.edu/starfish/previous.html > >>> > >>> Welcome to join our mailing list and your questions will reach more of > our > >>> group members. > >>> > >>> Jie > >>> > >>> On Wed, Mar 7, 2012 at 3:37 PM, Leonardo Urbina <[EMAIL PROTECTED]> > wrote: > >>> > >>> > Hi Jie, > >>> > > >>> > According to the Starfish README, the hadoop programs must be written > >>> using > >>> > the new Hadoop API. This is not my case (I am using MultipleInputs > among > >>> > other non-new API supported features). Is there any way around this? > >>> > Thanks, > >>> > > >>> > -Leo > >>> > > >>> > On Wed, Mar 7, 2012 at 3:19 PM, Jie Li <[EMAIL PROTECTED]> wrote: > >>> > > >>> > > Hi Leonardo, > >>> > > > >>> > > You might want to try Starfish which supports the memory profiling > as > >>> > well > >>> > > as cpu/disk/network profiling for the performance tuning. > >>> > > > >>> > > Jie > >>> > > ------------------ > >>> > > Starfish is an intelligent performance tuning tool for Hadoop. > >>> > > Homepage: www.cs.duke.edu/starfish/ > >>> > > Mailing list: http://groups.google.com/group/hadoop-starfish > >>> > > > >>> > > > >>> > > On Wed, Mar 7, 2012 at 2:36 PM, Leonardo Urbina <[EMAIL PROTECTED]> > >>> wrote: > >>> > > > >>> > > > Hello everyone, > >>> > > > > >>> > > > I have a Hadoop job that I run on several GBs of data that I am > >>> trying > >>> > to > >>> > > > optimize in order to reduce the memory consumption as well as > >>> improve > >>> > the > >>> > > > speed. I am following the steps outlined in Tom White's "Hadoop: Leo Urbina Massachusetts Institute of Technology Department of Electrical Engineering and Computer Science Department of Mathematics [EMAIL PROTECTED] |