Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop Streaming job Fails - Permission Denied error


Copy link to this message
-
Re: Hadoop Streaming job Fails - Permission Denied error
Bejoy,

The other problem I typically ran into using python streaming jobs was if my
mapper or reducer wrote to stdout. Since hadoop uses stdout to pass data
back to Hadoop, any erroneous "print" statements will cause the pipe to
break. The easiest way around this is to redirect "stdout" to "stderr" at
the entry point to your mapper and reducer; do this even before you import
any modules so that even if those modules call "print" it gets redirected.

Note: if your using dumbo (but I don't think you are) the above solution may
not work but I can send you a pointer.

J

On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:

> Thanks Jeremy. I tried with your first suggestion and the mappers ran into
> completion. But then the reducers failed with another exception related to
> pipes. I believe it may be due to permission issues again. I tried setting a
> few additional config parameters but it didn't do the job. Please find the
> command used and the error logs from jobtracker web UI
>
> hadoop  jar
> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
> dfs.data.dir=/home/streaming/tmp -D
> mapred.local.dir=/home/streaming/tmp/local -D
> mapred.system.dir=/home/streaming/tmp/system -D
> mapred.temp.dir=/home/streaming/tmp/temp -input
> /userdata/bejoy/apps/wc/input -output /userdata/bejoy/apps/wc/output
> -mapper /home/streaming/WcStreamMap.py  -reducer
> /home/streaming/WcStreamReduce.py
>
>
> java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
> failed with code 127
>     at
> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>     at
> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>     at org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>     at
> org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>
>     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>     at java.security.AccessController.doPrivileged(Native Method)
>     at javax.security.auth.Subject.doAs(Subject.java:396)
>     at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>
>
> The folder permissions at the time of job execution are as follows
>
> cloudera@cloudera-vm:~$ ls -l  /home/streaming/
> drwxrwxrwx 5 root root 4096 2011-09-12 05:59 tmp
> -rwxrwxrwx 1 root root  707 2011-09-11 23:42 WcStreamMap.py
> -rwxrwxrwx 1 root root 1077 2011-09-11 23:42 WcStreamReduce.py
>
> cloudera@cloudera-vm:~$ ls -l /home/streaming/tmp/
> drwxrwxrwx 2 root root 4096 2011-09-12 06:12 hadoop
> drwxrwxrwx 2 root root 4096 2011-09-12 05:58 local
> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 system
> drwxrwxrwx 2 root root 4096 2011-09-12 05:59 temp
>
> Am I missing some thing here?
>
> It is not for long I'm into Linux so couldn't try your second suggestion on
> setting up the Linux task controller.
>
> Thanks a lot
>
> Regards
> Bejoy.K.S
>
>
>
>
> On Mon, Sep 12, 2011 at 6:20 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote:
>
>> I would suggest you try putting your mapper/reducer py files in a
>> directory that is world readable at every level . i.e /tmp/test. I had
>> similar problems when I was using streaming and I believe my workaround was
>> to put the mapper/reducers outside my home directory. The other more
>> involved alternative is to setup the linux task controller so you can run
>> your MR jobs as the user who submits the jobs.
>>
>> J
>>
>>
>> On Mon, Sep 12, 2011 at 2:18 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>>
>>> Hi
>>>       I wanted to try out hadoop steaming and got the sample python code
>>> for mapper and reducer. I copied both into my lfs and tried running the
>>> steaming job as mention in the documentation.
>>> Here the command i used to run the job
>>>
>>> hadoop  jar
>>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar