|
|
-
Hadoop Streaming job Fails - Permission Denied error
Bejoy KS 2011-09-12, 09:18
Hi I wanted to try out hadoop steaming and got the sample python code for mapper and reducer. I copied both into my lfs and tried running the steaming job as mention in the documentation. Here the command i used to run the job
hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py
Here other than input and output the rest all are on lfs locations. How ever the job is failing. The error log from the jobtracker url is as
java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386) at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 9 more Caused by: java.lang.RuntimeException: Error in configuring object at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) ... 14 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) ... 17 more Caused by: java.lang.RuntimeException: configuration exception at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230) at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) ... 22 more Caused by: java.io.IOException: Cannot run program "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException: error=13, Permission denied at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) at org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) ... 23 more Caused by: java.io.IOException: java.io.IOException: error=13, Permission denied at java.lang.UNIXProcess.<init>(UNIXProcess.java:148) at java.lang.ProcessImpl.start(ProcessImpl.java:65) at java.lang.ProcessBuilder.start(ProcessBuilder.java:453) ... 24 more
On the error I checked the permissions of mapper and reducer. Issued a chmod 777 command as well. Still no luck.
The permission of the files are as follows cloudera@cloudera-vm:~$ ls -l /home/cloudera/bejoy/apps/inputs/wc/ -rwxrwxrwx 1 cloudera cloudera 707 2011-09-11 23:42 WcStreamMap.py -rwxrwxrwx 1 cloudera cloudera 1077 2011-09-11 23:42 WcStreamReduce.py
I'm testing the same on Cloudera Demo VM. So the hadoop setup would be on pseudo distributed mode. Any help would be highly appreciated.
Thank You
Regards Bejoy.K.S
-
Re: Hadoop Streaming job Fails - Permission Denied error
Jeremy Lewi 2011-09-12, 13:20
I would suggest you try putting your mapper/reducer py files in a directory that is world readable at every level . i.e /tmp/test. I had similar problems when I was using streaming and I believe my workaround was to put the mapper/reducers outside my home directory. The other more involved alternative is to setup the linux task controller so you can run your MR jobs as the user who submits the jobs.
J
On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi > I wanted to try out hadoop steaming and got the sample python code > for mapper and reducer. I copied both into my lfs and tried running the > steaming job as mention in the documentation. > Here the command i used to run the job > > hadoop jar > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar > -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output > -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer > /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py > > Here other than input and output the rest all are on lfs locations. How > ever the job is failing. The error log from the jobtracker url is as > > java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386) > at org.apache.hadoop.mapred.MapTask.run(MapTask.java:324) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 9 more > Caused by: java.lang.RuntimeException: Error in configuring object > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) > at > org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) > at > org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) > at org.apache.hadoop.mapred.MapRunner.configure(MapRunner.java:34) > ... 14 more > Caused by: java.lang.reflect.InvocationTargetException > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > at java.lang.reflect.Method.invoke(Method.java:597) > at > org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:88) > ... 17 more > Caused by: java.lang.RuntimeException: configuration exception > at > org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:230) > at org.apache.hadoop.streaming.PipeMapper.configure(PipeMapper.java:66) > ... 22 more > Caused by: java.io.IOException: Cannot run program > "/home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py": java.io.IOException: > error=13, Permission denied > at java.lang.ProcessBuilder.start(ProcessBuilder.java:460) > at > org.apache.hadoop.streaming.PipeMapRed.configure(PipeMapRed.java:214) > ... 23 more > Caused by: java.io.IOException: java.io.IOException: error=13, Permission > denied > at java.lang.UNIXProcess.<init>(UNIXProcess.java:148)
-
Re: Hadoop Streaming job Fails - Permission Denied error
Bejoy KS 2011-09-12, 15:27
Thanks Jeremy. I tried with your first suggestion and the mappers ran into completion. But then the reducers failed with another exception related to pipes. I believe it may be due to permission issues again. I tried setting a few additional config parameters but it didn't do the job. Please find the command used and the error logs from jobtracker web UI
hadoop jar /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D dfs.data.dir=/home/streaming/tmp -D mapred.local.dir=/home/streaming/tmp/local -D mapred.system.dir=/home/streaming/tmp/system -D mapred.temp.dir=/home/streaming/tmp/temp -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output -mapper /home/streaming/WcStreamMap.py -reducer /home/streaming/WcStreamReduce.py java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) The folder permissions at the time of job execution are as follows
cloudera@cloudera-vm:~$ ls -l /home/streaming/ drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/ drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
Am I missing some thing here?
It is not for long I'm into Linux so couldn't try your second suggestion on setting up the Linux task controller.
Thanks a lot
Regards Bejoy.K.S
On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote:
> I would suggest you try putting your mapper/reducer py files in a directory > that is world readable at every level . i.e /tmp/test. I had similar > problems when I was using streaming and I believe my workaround was to put > the mapper/reducers outside my home directory. The other more involved > alternative is to setup the linux task controller so you can run your MR > jobs as the user who submits the jobs. > > J > > > On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <[EMAIL PROTECTED]> wrote: > >> Hi >> I wanted to try out hadoop steaming and got the sample python code >> for mapper and reducer. I copied both into my lfs and tried running the >> steaming job as mention in the documentation. >> Here the command i used to run the job >> >> hadoop jar >> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >> -input /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output >> -mapper /home/cloudera/bejoy/apps/inputs/wc/WcStreamMap.py -reducer >> /home/cloudera/bejoy/apps/inputs/wc/WcStreamReduce.py >> >> Here other than input and output the rest all are on lfs locations. How >> ever the job is failing. The error log from the jobtracker url is as >> >> java.lang.RuntimeException: Error in configuring object >> at >> org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93) >> at >> org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64) >> at >> org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117) >> at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:386)
-
Re: Hadoop Streaming job Fails - Permission Denied error
Jeremy Lewi 2011-09-13, 03:36
Bejoy,
The other problem I typically ran into using python streaming jobs was if my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data back to Hadoop, any erroneous "print" statements will cause the pipe to break. The easiest way around this is to redirect "stdout" to "stderr" at the entry point to your mapper and reducer; do this even before you import any modules so that even if those modules call "print" it gets redirected.
Note: if your using dumbo (but I don't think you are) the above solution may not work but I can send you a pointer.
J
On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Thanks Jeremy. I tried with your first suggestion and the mappers ran into > completion. But then the reducers failed with another exception related to > pipes. I believe it may be due to permission issues again. I tried setting a > few additional config parameters but it didn't do the job. Please find the > command used and the error logs from jobtracker web UI > > hadoop jar > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar > -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D > dfs.data.dir=/home/streaming/tmp -D > mapred.local.dir=/home/streaming/tmp/local -D > mapred.system.dir=/home/streaming/tmp/system -D > mapred.temp.dir=/home/streaming/tmp/temp -input > /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output > -mapper /home/streaming/WcStreamMap.py -reducer > /home/streaming/WcStreamReduce.py > > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess > failed with code 127 > at > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) > at > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) > at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > > > The folder permissions at the time of job execution are as follows > > cloudera@cloudera-vm:~$ ls -l /home/streaming/ > drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp > -rwxrwxrwx 1 root root 707 2011-09-11 23:42 WcStreamMap.py > -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py > > cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/ > drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop > drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local > drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system > drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp > > Am I missing some thing here? > > It is not for long I'm into Linux so couldn't try your second suggestion on > setting up the Linux task controller. > > Thanks a lot > > Regards > Bejoy.K.S > > > > > On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: > >> I would suggest you try putting your mapper/reducer py files in a >> directory that is world readable at every level . i.e /tmp/test. I had >> similar problems when I was using streaming and I believe my workaround was >> to put the mapper/reducers outside my home directory. The other more >> involved alternative is to setup the linux task controller so you can run >> your MR jobs as the user who submits the jobs. >> >> J >> >> >> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <[EMAIL PROTECTED]> wrote: >> >>> Hi >>> I wanted to try out hadoop steaming and got the sample python code >>> for mapper and reducer. I copied both into my lfs and tried running the >>> steaming job as mention in the documentation. >>> Here the command i used to run the job >>> >>> hadoop jar >>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
-
Re: Hadoop Streaming job Fails - Permission Denied error
Bejoy KS 2011-09-13, 07:42
Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the entry point to your mapper and reducer'. Basically I'm a java hadoop developer and has no idea on python programming. Could you please help me with mode details like the line of code i need to include to achieve this.
Also I tried a still more deep drill down on my error logs and found the following line as well
*stderr logs*
/usr/bin/env: python : No such file or directory java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess failed with code 127 at org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) at org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) at org.apache.hadoop.mapred.Child$4.run(Child.java:268) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Subject.java:396) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) at org.apache.hadoop.mapred.Child.main(Child.java:262) log4j:WARN No appenders could be found for logger (org.apache.hadoop.hdfs.DFSClient). log4j:WARN Please initialize the log4j system properly.
I verified on the existence of such a directory and it was present '/usr/bin/env' .
Could you please provide little more guidance on the same.
On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote:
> Bejoy, > > The other problem I typically ran into using python streaming jobs was if > my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data > back to Hadoop, any erroneous "print" statements will cause the pipe to > break. The easiest way around this is to redirect "stdout" to "stderr" at > the entry point to your mapper and reducer; do this even before you import > any modules so that even if those modules call "print" it gets redirected. > > Note: if your using dumbo (but I don't think you are) the above solution > may not work but I can send you a pointer. > > J > > > On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]> wrote: > >> Thanks Jeremy. I tried with your first suggestion and the mappers ran into >> completion. But then the reducers failed with another exception related to >> pipes. I believe it may be due to permission issues again. I tried setting a >> few additional config parameters but it didn't do the job. Please find the >> command used and the error logs from jobtracker web UI >> >> hadoop jar >> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D >> dfs.data.dir=/home/streaming/tmp -D >> mapred.local.dir=/home/streaming/tmp/local -D >> mapred.system.dir=/home/streaming/tmp/system -D >> mapred.temp.dir=/home/streaming/tmp/temp -input >> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output >> -mapper /home/streaming/WcStreamMap.py -reducer >> /home/streaming/WcStreamReduce.py >> >> >> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess >> failed with code 127 >> at >> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) >> at >> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) >> at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) >> at >> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) >> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) >> >> at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> at java.security.AccessController.doPrivileged(Native Method) >> at javax.security.auth.Subject.doAs(Subject.java:396) >> at >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> at org.apache.hadoop.mapred.Child.main(Child.java:262)
-
Re: Hadoop Streaming job Fails - Permission Denied error
Harsh J 2011-09-13, 08:06
The env binary would be present, but do all your TT nodes have python properly installed on them? The env program can't find them and that's probably why your scripts with shbang don't run.
On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the > entry point to your mapper and reducer'. > Basically I'm a java hadoop developer and has no idea on python programming. > Could you please help me with mode details like the line of code i need to > include to achieve this. > > Also I tried a still more deep drill down on my error logs and found the > following line as well > > stderr logs > > /usr/bin/env: python > : No such file or directory > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess > failed with code 127 > at > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) > at > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) > at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > at > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:396) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > at org.apache.hadoop.mapred.Child.main(Child.java:262) > log4j:WARN No appenders could be found for logger > (org.apache.hadoop.hdfs.DFSClient). > log4j:WARN Please initialize the log4j system properly. > > I verified on the existence of such a directory and it was present > '/usr/bin/env' . > > Could you please provide little more guidance on the same. > > > > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: >> >> Bejoy, >> The other problem I typically ran into using python streaming jobs was if >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data >> back to Hadoop, any erroneous "print" statements will cause the pipe to >> break. The easiest way around this is to redirect "stdout" to "stderr" at >> the entry point to your mapper and reducer; do this even before you import >> any modules so that even if those modules call "print" it gets redirected. >> Note: if your using dumbo (but I don't think you are) the above solution >> may not work but I can send you a pointer. >> J >> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]> wrote: >>> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran >>> into completion. But then the reducers failed with another exception related >>> to pipes. I believe it may be due to permission issues again. I tried >>> setting a few additional config parameters but it didn't do the job. Please >>> find the command used and the error logs from jobtracker web UI >>> >>> hadoop jar >>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D >>> dfs.data.dir=/home/streaming/tmp -D >>> mapred.local.dir=/home/streaming/tmp/local -D >>> mapred.system.dir=/home/streaming/tmp/system -D >>> mapred.temp.dir=/home/streaming/tmp/temp -input >>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output >>> -mapper /home/streaming/WcStreamMap.py -reducer >>> /home/streaming/WcStreamReduce.py >>> >>> >>> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess >>> failed with code 127 >>> at >>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) >>> at >>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) >>> at >>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) >>> at >>> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) >> Harsh J
-
Re: Hadoop Streaming job Fails - Permission Denied error
Bejoy KS 2011-09-13, 08:42
Hi Harsh Thank You for the response. I'm on Cloudera demo VM. It is on hadoop 0.20 and has python installed. Do I have to do any further installation/configuration to get python running?
On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <[EMAIL PROTECTED]> wrote:
> The env binary would be present, but do all your TT nodes have python > properly installed on them? The env program can't find them and that's > probably why your scripts with shbang don't run. > > On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: > > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the > > entry point to your mapper and reducer'. > > Basically I'm a java hadoop developer and has no idea on python > programming. > > Could you please help me with mode details like the line of code i need > to > > include to achieve this. > > > > Also I tried a still more deep drill down on my error logs and found the > > following line as well > > > > stderr logs > > > > /usr/bin/env: python > > : No such file or directory > > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess > > failed with code 127 > > at > > > org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) > > at > > > org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) > > at > org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) > > at > > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) > > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) > > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) > > at java.security.AccessController.doPrivileged(Native Method) > > at javax.security.auth.Subject.doAs(Subject.java:396) > > at > > > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) > > at org.apache.hadoop.mapred.Child.main(Child.java:262) > > log4j:WARN No appenders could be found for logger > > (org.apache.hadoop.hdfs.DFSClient). > > log4j:WARN Please initialize the log4j system properly. > > > > I verified on the existence of such a directory and it was present > > '/usr/bin/env' . > > > > Could you please provide little more guidance on the same. > > > > > > > > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: > >> > >> Bejoy, > >> The other problem I typically ran into using python streaming jobs was > if > >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass > data > >> back to Hadoop, any erroneous "print" statements will cause the pipe to > >> break. The easiest way around this is to redirect "stdout" to "stderr" > at > >> the entry point to your mapper and reducer; do this even before you > import > >> any modules so that even if those modules call "print" it gets > redirected. > >> Note: if your using dumbo (but I don't think you are) the above solution > >> may not work but I can send you a pointer. > >> J > >> > >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]> > wrote: > >>> > >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran > >>> into completion. But then the reducers failed with another exception > related > >>> to pipes. I believe it may be due to permission issues again. I tried > >>> setting a few additional config parameters but it didn't do the job. > Please > >>> find the command used and the error logs from jobtracker web UI > >>> > >>> hadoop jar > >>> > /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar > >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D > >>> dfs.data.dir=/home/streaming/tmp -D > >>> mapred.local.dir=/home/streaming/tmp/local -D > >>> mapred.system.dir=/home/streaming/tmp/system -D > >>> mapred.temp.dir=/home/streaming/tmp/temp -input > >>> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output > >>> -mapper /home/streaming/WcStreamMap.py -reducer > >>> /home/streaming/WcStreamReduce.py > >>> > >>> >
-
Re: Hadoop Streaming job Fails - Permission Denied error
Jeremy Lewi 2011-09-13, 20:09
Benjoy to redirect stdout add the lines
import sys sys.stdout=sys.stderr
to the top of your py files (i.e right after the shebang line).
J
On Tue, Sep 13, 2011 at 1:42 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Harsh > Thank You for the response. I'm on Cloudera demo VM. It is on > hadoop 0.20 and has python installed. Do I have to do any further > installation/configuration to get python running? > > > On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <[EMAIL PROTECTED]> wrote: > >> The env binary would be present, but do all your TT nodes have python >> properly installed on them? The env program can't find them and that's >> probably why your scripts with shbang don't run. >> >> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <[EMAIL PROTECTED]> wrote: >> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the >> > entry point to your mapper and reducer'. >> > Basically I'm a java hadoop developer and has no idea on python >> programming. >> > Could you please help me with mode details like the line of code i need >> to >> > include to achieve this. >> > >> > Also I tried a still more deep drill down on my error logs and found the >> > following line as well >> > >> > stderr logs >> > >> > /usr/bin/env: python >> > : No such file or directory >> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess >> > failed with code 127 >> > at >> > >> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362) >> > at >> > >> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572) >> > at >> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137) >> > at >> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478) >> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416) >> > at org.apache.hadoop.mapred.Child$4.run(Child.java:268) >> > at java.security.AccessController.doPrivileged(Native Method) >> > at javax.security.auth.Subject.doAs(Subject.java:396) >> > at >> > >> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115) >> > at org.apache.hadoop.mapred.Child.main(Child.java:262) >> > log4j:WARN No appenders could be found for logger >> > (org.apache.hadoop.hdfs.DFSClient). >> > log4j:WARN Please initialize the log4j system properly. >> > >> > I verified on the existence of such a directory and it was present >> > '/usr/bin/env' . >> > >> > Could you please provide little more guidance on the same. >> > >> > >> > >> > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote: >> >> >> >> Bejoy, >> >> The other problem I typically ran into using python streaming jobs was >> if >> >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass >> data >> >> back to Hadoop, any erroneous "print" statements will cause the pipe to >> >> break. The easiest way around this is to redirect "stdout" to "stderr" >> at >> >> the entry point to your mapper and reducer; do this even before you >> import >> >> any modules so that even if those modules call "print" it gets >> redirected. >> >> Note: if your using dumbo (but I don't think you are) the above >> solution >> >> may not work but I can send you a pointer. >> >> J >> >> >> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]> >> wrote: >> >>> >> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran >> >>> into completion. But then the reducers failed with another exception >> related >> >>> to pipes. I believe it may be due to permission issues again. I tried >> >>> setting a few additional config parameters but it didn't do the job. >> Please >> >>> find the command used and the error logs from jobtracker web UI >> >>> >> >>> hadoop jar >> >>> >> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar >> >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D >> >>> dfs.data.dir=/home/streaming/tmp -D >> >>> mapred.local.dir=/home/streaming/tmp/local -D
|
|