Hadoop Streaming does not magically teach Python open() how to read
from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs
-cat" to read the file for you.
A few links that may help:
On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote:
> When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when
> " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok.
> 1 #!/bin/env python
> 3 import sys
> 5 for line in sys.stdin:
> 6 offset,filename = line.split("\t")
> 7 file = "hdfs://user/hdfs/catalog3/" + filename
> 8 print line
> 9 print filename
> 10 print file
> 11 file_obj = open(file)