|
|
springring 2013-01-12, 08:30
Hi,
When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok.
1 #!/bin/env python 2 3 import sys 4 5 for line in sys.stdin: 6 offset,filename = line.split("\t") 7 file = "hdfs://user/hdfs/catalog3/" + filename 8 print line 9 print filename 10 print file 11 file_obj = open(file) ..................................
-
Re: python streaming error
Nitin Pawar 2013-01-12, 08:34
is this correct path for writing onto hdfs?
"hdfs://user/hdfs/catalog3."
I don't see the namenode info in the path. Can this cause any issue. Just making an guess something like hdfs://host:port/path
On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote:
> hdfs://user/hdfs/catalog3/
-- Nitin Pawar
-
Re:Re: python streaming error
springring 2013-01-12, 08:55
hi,
I modify the file as below, there is still error
1 #!/bin/env python 2 3 import sys 4 5 for line in sys.stdin: 6 offset,filename = line.split("\t") 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename 8 print line 9 print filename 10 print file 11 file_obj = open(file)
At 2013-01-12 16:34:37,"Nitin Pawar" <[EMAIL PROTECTED]> wrote: >is this correct path for writing onto hdfs? > >"hdfs://user/hdfs/catalog3." > >I don't see the namenode info in the path. Can this cause any issue. Just >making an guess >something like hdfs://host:port/path > >On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote: > >> hdfs://user/hdfs/catalog3/ > > > > > >-- >Nitin Pawar
-
Re: Re: python streaming error
Nitin Pawar 2013-01-12, 08:58
computedb-13 is not a valid host name
may be if you have local hadoop then you can name refer it with hdfs://localhost:9100/ or hdfs://127.0.0.1:9100
if its on other machine then just try with IP address of that machine On Sat, Jan 12, 2013 at 12:55 AM, springring <[EMAIL PROTECTED]> wrote:
> hi, > > I modify the file as below, there is still error > > 1 #!/bin/env python > 2 > 3 import sys > 4 > 5 for line in sys.stdin: > 6 offset,filename = line.split("\t") > 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename > 8 print line > 9 print filename > 10 print file > 11 file_obj = open(file) > > > > > > > > > > At 2013-01-12 16:34:37,"Nitin Pawar" <[EMAIL PROTECTED]> wrote: > >is this correct path for writing onto hdfs? > > > >"hdfs://user/hdfs/catalog3." > > > >I don't see the namenode info in the path. Can this cause any issue. Just > >making an guess > >something like hdfs://host:port/path > > > >On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote: > > > >> hdfs://user/hdfs/catalog3/ > > > > > > > > > > > >-- > >Nitin Pawar >
-- Nitin Pawar
-
Re:Re: Re: python streaming error
springring 2013-01-14, 01:27
hi, I find the key point, not the hostname, it is right. just chang "offset,filename = line.split("\t")" to "offset,filename = line.strip().split("\t")" now it pass
At 2013-01-12 16:58:29,"Nitin Pawar" <[EMAIL PROTECTED]> wrote: >computedb-13 is not a valid host name > >may be if you have local hadoop then you can name refer it with >hdfs://localhost:9100/ or hdfs://127.0.0.1:9100 > >if its on other machine then just try with IP address of that machine > > >On Sat, Jan 12, 2013 at 12:55 AM, springring <[EMAIL PROTECTED]> wrote: > >> hi, >> >> I modify the file as below, there is still error >> >> 1 #!/bin/env python >> 2 >> 3 import sys >> 4 >> 5 for line in sys.stdin: >> 6 offset,filename = line.split("\t") >> 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename >> 8 print line >> 9 print filename >> 10 print file >> 11 file_obj = open(file) >> >> >> >> >> >> >> >> >> >> At 2013-01-12 16:34:37,"Nitin Pawar" <[EMAIL PROTECTED]> wrote: >> >is this correct path for writing onto hdfs? >> > >> >"hdfs://user/hdfs/catalog3." >> > >> >I don't see the namenode info in the path. Can this cause any issue. Just >> >making an guess >> >something like hdfs://host:port/path >> > >> >On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote: >> > >> >> hdfs://user/hdfs/catalog3/ >> > >> > >> > >> > >> > >> >-- >> >Nitin Pawar >> > > > >-- >Nitin Pawar
-
Re:Re:Re: Re: python streaming error
springring 2013-01-14, 01:53
sorry the error keep on, even when i modify the code
"offset,filename = line.strip().split("\t")" At 2013-01-14 09:27:10,springring <[EMAIL PROTECTED]> wrote: >hi, > I find the key point, not the hostname, it is right. >just chang "offset,filename = line.split("\t")" to >"offset,filename = line.strip().split("\t")" >now it pass > > > > > > > >At 2013-01-12 16:58:29,"Nitin Pawar" <[EMAIL PROTECTED]> wrote: >>computedb-13 is not a valid host name >> >>may be if you have local hadoop then you can name refer it with >>hdfs://localhost:9100/ or hdfs://127.0.0.1:9100 >> >>if its on other machine then just try with IP address of that machine >> >> >>On Sat, Jan 12, 2013 at 12:55 AM, springring <[EMAIL PROTECTED]> wrote: >> >>> hi, >>> >>> I modify the file as below, there is still error >>> >>> 1 #!/bin/env python >>> 2 >>> 3 import sys >>> 4 >>> 5 for line in sys.stdin: >>> 6 offset,filename = line.split("\t") >>> 7 file = "hdfs://computeb-13:9100/user/hdfs/catalog3/" + filename >>> 8 print line >>> 9 print filename >>> 10 print file >>> 11 file_obj = open(file) >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> At 2013-01-12 16:34:37,"Nitin Pawar" <[EMAIL PROTECTED]> wrote: >>> >is this correct path for writing onto hdfs? >>> > >>> >"hdfs://user/hdfs/catalog3." >>> > >>> >I don't see the namenode info in the path. Can this cause any issue. Just >>> >making an guess >>> >something like hdfs://host:port/path >>> > >>> >On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote: >>> > >>> >> hdfs://user/hdfs/catalog3/ >>> > >>> > >>> > >>> > >>> > >>> >-- >>> >Nitin Pawar >>> >> >> >> >>-- >>Nitin Pawar
-
Re: python streaming error
Andy Isaacson 2013-01-14, 22:24
Oh, another link I should have included! http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/-andy On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <[EMAIL PROTECTED]> wrote: > Hadoop Streaming does not magically teach Python open() how to read > from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs > -cat" to read the file for you. > > A few links that may help: > > http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs> https://bitbucket.org/turnaev/cyhdfs> > -andy > > On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote: >> Hi, >> >> When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when >> " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok. >> >> 1 #!/bin/env python >> 2 >> 3 import sys >> 4 >> 5 for line in sys.stdin: >> 6 offset,filename = line.split("\t") >> 7 file = "hdfs://user/hdfs/catalog3/" + filename >> 8 print line >> 9 print filename >> 10 print file >> 11 file_obj = open(file) >> .................................. >>
-
Re: python streaming error
Simone Leo 2013-01-15, 09:44
Hello, you can use the Pydoop HDFS API to work with HDFS files: >>> import pydoop.hdfs as hdfs >>> with hdfs.open('hdfs://localhost:8020/user/myuser/filename') as f: ... for line in f: ... do_something(line) As you can see, the API is very similar to that of ordinary Python file objects. Check out the following tutorial for more details: http://pydoop.sourceforge.net/docs/tutorial/hdfs_api.htmlNote that Pydoop also has a MapReduce API, so you can use it to rewrite the whole program: http://pydoop.sourceforge.net/docs/tutorial/mapred_api.htmlIt also has a more compact and easy-to-use scripting engine for simple applications: http://pydoop.sourceforge.net/docs/tutorial/pydoop_script.htmlIf you think Pydoop is right for you, read the installation guide: http://pydoop.sourceforge.net/docs/installation.htmlSimone On 01/14/2013 11:24 PM, Andy Isaacson wrote: > Oh, another link I should have included! > http://blog.cloudera.com/blog/2013/01/a-guide-to-python-frameworks-for-hadoop/> > -andy > > On Mon, Jan 14, 2013 at 2:19 PM, Andy Isaacson <[EMAIL PROTECTED]> wrote: >> Hadoop Streaming does not magically teach Python open() how to read >> from "hdfs://" URLs. You'll need to use a library or fork a "hdfs dfs >> -cat" to read the file for you. >> >> A few links that may help: >> >> http://www.michael-noll.com/tutorials/writing-an-hadoop-mapreduce-program-in-python/>> http://stackoverflow.com/questions/12485718/python-read-file-as-stream-from-hdfs>> https://bitbucket.org/turnaev/cyhdfs>> >> -andy >> >> On Sat, Jan 12, 2013 at 12:30 AM, springring <[EMAIL PROTECTED]> wrote: >>> Hi, >>> >>> When I run code below as a streaming, the job error N/A and killed. I run step by step, find it error when >>> " file_obj = open(file) " . When I run same code outside of hadoop, everything is ok. >>> >>> 1 #!/bin/env python >>> 2 >>> 3 import sys >>> 4 >>> 5 for line in sys.stdin: >>> 6 offset,filename = line.split("\t") >>> 7 file = "hdfs://user/hdfs/catalog3/" + filename >>> 8 print line >>> 9 print filename >>> 10 print file >>> 11 file_obj = open(file) >>> .................................. >>> -- Simone Leo Data Fusion - Distributed Computing CRS4 POLARIS - Building #1 Piscina Manna I-09010 Pula (CA) - Italy e-mail: [EMAIL PROTECTED] http://www.crs4.it
|
|