Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # dev - python streaming error


+
springring 2013-01-12, 08:30
+
Nitin Pawar 2013-01-12, 08:34
+
springring 2013-01-12, 08:55
+
Nitin Pawar 2013-01-12, 08:58
+
springring 2013-01-14, 01:27
+
springring 2013-01-14, 01:53
Copy link to this message
-
Re: python streaming error
Andy Isaacson 2013-01-14, 22:19
Hadoop Streaming does not magically teach Python open() how to read
from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
-cat" to read the file for you.

A few links that may help:

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
https://bitbucket.org/turnaev/cyhdfs

-andy

On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote:
> Hi,
>
>      When I run code below as a streaming, the job error N/A and killed.  I run step by step, find it error when
> " file_obj = open(file) " .  When I run same code outside of hadoop, everything is ok.
>
>   1 #!/bin/env python
>   2
>   3 import sys
>   4
>   5 for line in sys.stdin:
>   6     offset,filename = line.split("\t")
>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>   8     print line
>   9     print filename
>  10     print file
>  11     file_obj = open(file)
> ..................................
>
+
Andy Isaacson 2013-01-14, 22:24
+
Simone Leo 2013-01-15, 09:44