Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Hadoop Streaming job Fails - Permission Denied error


Copy link to this message
-
Re: Hadoop Streaming job Fails - Permission Denied error
Benjoy to redirect stdout add the lines

import sys
sys.stdout=sys.stderr

to the top of your py files (i.e right after the shebang line).

J

On Tue, Sep 13, 2011 at 1:42 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:

> Hi Harsh
>          Thank You for the response. I'm on Cloudera demo VM. It is on
> hadoop 0.20 and has python installed. Do I have to do any further
> installation/configuration to get python running?
>
>
> On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>
>> The env binary would be present, but do all your TT nodes have python
>> properly installed on them? The env program can't find them and that's
>> probably why your scripts with shbang don't run.
>>
>> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
>> > entry point to your mapper and reducer'.
>> > Basically I'm a java hadoop developer and has no idea on python
>> programming.
>> > Could you please help me with mode details like the line of code i need
>> to
>> > include to achieve this.
>> >
>> > Also I tried a still more deep drill down on my error logs and found the
>> > following line as well
>> >
>> > stderr logs
>> >
>> > /usr/bin/env: python
>> > : No such file or directory
>> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>> > failed with code 127
>> >     at
>> >
>> org.apache.hadoop.streaming.PipeMapRed.waitOutputThreads(PipeMapRed.java:362)
>> >     at
>> >
>> org.apache.hadoop.streaming.PipeMapRed.mapRedFinished(PipeMapRed.java:572)
>> >     at
>> org.apache.hadoop.streaming.PipeReducer.close(PipeReducer.java:137)
>> >     at
>> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>> >     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> >     at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> >     at java.security.AccessController.doPrivileged(Native Method)
>> >     at javax.security.auth.Subject.doAs(Subject.java:396)
>> >     at
>> >
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1115)
>> >     at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> > log4j:WARN No appenders could be found for logger
>> > (org.apache.hadoop.hdfs.DFSClient).
>> > log4j:WARN Please initialize the log4j system properly.
>> >
>> > I verified on the existence of such a directory and it was present
>> > '/usr/bin/env' .
>> >
>> > Could you please provide little more guidance on the same.
>> >
>> >
>> >
>> > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote:
>> >>
>> >> Bejoy,
>> >> The other problem I typically ran into using python streaming jobs was
>> if
>> >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass
>> data
>> >> back to Hadoop, any erroneous "print" statements will cause the pipe to
>> >> break. The easiest way around this is to redirect "stdout" to "stderr"
>> at
>> >> the entry point to your mapper and reducer; do this even before you
>> import
>> >> any modules so that even if those modules call "print" it gets
>> redirected.
>> >> Note: if your using dumbo (but I don't think you are) the above
>> solution
>> >> may not work but I can send you a pointer.
>> >> J
>> >>
>> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]>
>> wrote:
>> >>>
>> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran
>> >>> into completion. But then the reducers failed with another exception
>> related
>> >>> to pipes. I believe it may be due to permission issues again. I tried
>> >>> setting a few additional config parameters but it didn't do the job.
>> Please
>> >>> find the command used and the error logs from jobtracker web UI
>> >>>
>> >>> hadoop  jar
>> >>>
>> /usr/lib/hadoop-0.20/contrib/streaming/hadoop-streaming-0.20.2-cdh3u0.jar
>> >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
>> >>> dfs.data.dir=/home/streaming/tmp -D
>> >>> mapred.local.dir=/home/streaming/tmp/local -D
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB