Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Hadoop, mail # dev - python streaming error


+
springring 2013-01-12, 08:30
+
Nitin Pawar 2013-01-12, 08:34
+
springring 2013-01-12, 08:55
+
Nitin Pawar 2013-01-12, 08:58
+
springring 2013-01-14, 01:27
+
springring 2013-01-14, 01:53
+
Andy Isaacson 2013-01-14, 22:19
Copy link to this message
-
Re: python streaming error
Andy Isaacson 2013-01-14, 22:24
Oh, another link I should have included!
http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/

-andy

On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <[EMAIL PROTECTED]> wrote:
> Hadoop Streaming does not magically teach Python open() how to read
> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
> -cat" to read the file for you.
>
> A few links that may help:
>
> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/
> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs
> https://bitbucket.org/turnaev/cyhdfs
>
> -andy
>
> On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>>      When I run code below as a streaming, the job error N/A and killed.  I run step by step, find it error when
>> " file_obj = open(file) " .  When I run same code outside of hadoop, everything is ok.
>>
>>   1 #!/bin/env python
>>   2
>>   3 import sys
>>   4
>>   5 for line in sys.stdin:
>>   6     offset,filename = line.split("\t")
>>   7     file = "hdfs://user/hdfs/catalog3/" + filename
>>   8     print line
>>   9     print filename
>>  10     print file
>>  11     file_obj = open(file)
>> ..................................
>>
+
Simone Leo 2013-01-15, 09:44