Benjoy to redirect stdout add the lines
to the top of your py files (i.e right after the shebang line).
On Tue, Sep 13, 2011 at 1:42 AM, Bejoy KS <[EMAIL PROTECTED]> wrote:
> Hi Harsh
> Thank You for the response. I'm on Cloudera demo VM. It is on
> hadoop 0.20 and has python installed. Do I have to do any further
> installation/configuration to get python running?
> On Tue, Sep 13, 2011 at 1:36 PM, Harsh J <[EMAIL PROTECTED]> wrote:
>> The env binary would be present, but do all your TT nodes have python
>> properly installed on them? The env program can't find them and that's
>> probably why your scripts with shbang don't run.
>> On Tue, Sep 13, 2011 at 1:12 PM, Bejoy KS <[EMAIL PROTECTED]> wrote:
>> > Thanks Jeremy. But I didn't follow 'redirect "stdout" to "stderr" at the
>> > entry point to your mapper and reducer'.
>> > Basically I'm a java hadoop developer and has no idea on python
>> > Could you please help me with mode details like the line of code i need
>> > include to achieve this.
>> > Also I tried a still more deep drill down on my error logs and found the
>> > following line as well
>> > stderr logs
>> > /usr/bin/env: python
>> > : No such file or directory
>> > java.lang.RuntimeException: PipeMapRed.waitOutputThreads(): subprocess
>> > failed with code 127
>> > at
>> > at
>> > at
>> > at
>> > org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:478)
>> > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:416)
>> > at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
>> > at java.security.AccessController.doPrivileged(Native Method)
>> > at javax.security.auth.Subject.doAs(Subject.java:396)
>> > at
>> > at org.apache.hadoop.mapred.Child.main(Child.java:262)
>> > log4j:WARN No appenders could be found for logger
>> > (org.apache.hadoop.hdfs.DFSClient).
>> > log4j:WARN Please initialize the log4j system properly.
>> > I verified on the existence of such a directory and it was present
>> > '/usr/bin/env' .
>> > Could you please provide little more guidance on the same.
>> > On Tue, Sep 13, 2011 at 9:06 AM, Jeremy Lewi <[EMAIL PROTECTED]> wrote:
>> >> Bejoy,
>> >> The other problem I typically ran into using python streaming jobs was
>> >> my mapper or reducer wrote to stdout. Since hadoop uses stdout to pass
>> >> back to Hadoop, any erroneous "print" statements will cause the pipe to
>> >> break. The easiest way around this is to redirect "stdout" to "stderr"
>> >> the entry point to your mapper and reducer; do this even before you
>> >> any modules so that even if those modules call "print" it gets
>> >> Note: if your using dumbo (but I don't think you are) the above
>> >> may not work but I can send you a pointer.
>> >> J
>> >> On Mon, Sep 12, 2011 at 8:27 AM, Bejoy KS <[EMAIL PROTECTED]>
>> >>> Thanks Jeremy. I tried with your first suggestion and the mappers ran
>> >>> into completion. But then the reducers failed with another exception
>> >>> to pipes. I believe it may be due to permission issues again. I tried
>> >>> setting a few additional config parameters but it didn't do the job.
>> >>> find the command used and the error logs from jobtracker web UI
>> >>> hadoop jar
>> >>> -D hadoop.tmp.dir=/home/streaming/tmp/hadoop/ -D
>> >>> dfs.data.dir=/home/streaming/tmp -D
>> >>> mapred.local.dir=/home/streaming/tmp/local -D