Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # dev >> python streaming error


Copy link to this message
-
Re: python streaming error
Hadoop Streaming does not magically teach Python open() how to read
from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
-cat" to read the file for you.

A few links that may help:

http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
https://bitbucket.org/turnaev/cyhdfs

-andy

On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote:
> Hi,
>
>      When I run code below as a streaming, the job error N/A and killed.  I run step by step, find it error when
> " file_obj = open(file) " .  When I run same code outside of hadoop, everything is ok.
>
>   1 #!/bin/env python
>   2
>   3 import sys
>   4
>   5 for line in sys.stdin:
>   6     offset,filename = line.split("\t")
>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>   8     print line
>   9     print filename
>  10     print file
>  11     file_obj = open(file)
> ..................................
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB