Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> listdir() python function is not wokring on hadoop


Copy link to this message
-
Re: listdir() python function is not wokring on hadoop
Hi All

    Thanks for you suggestions
But in my case I have thousands small files and I want read them one by
one.I think it is only possible by using listdir().
As per Nitin comment I tried to install Pydoop but it is throwing me some
strange error and I am not finding any inforamtion on pydoop on google.

thanks
Haider
On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
<[EMAIL PROTECTED]>wrote:

> Haider,
> You can use TextLoader to read a file in HDFS line by line, and then you
> can pass those lines to your python UDF. Something like the following
> should work:
>
> x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> y = foreach x generate my_udf(line);
>
> -----Original Message-----
> From: Haider [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, December 5, 2013 10:12 PM
> To: [EMAIL PROTECTED]
> Subject: Re: listdir() python function is not wokring on hadoop
>
> I am trying to read from HDFS not from Local file system, so would it be
> possible through listdir? or is there any way to read hdfs files one by one
> and passing to one funtion.
>
>
>
>
> On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> <[EMAIL PROTECTED]>wrote:
>
> > I can call listdir to read from local filesystem in a python UDF. Did
> > you implement your function as a proper UDF?
> > ________________________________________
> > From: Haider [[EMAIL PROTECTED]]
> > Sent: Monday, December 02, 2013 5:22 AM
> > To: [EMAIL PROTECTED]
> > Subject: listdir() python function is not wokring on hadoop
> >
> > Hi all
> >
> >    is there any one who successfully used listdir() function to
> > retrieve files one by one from HDFS using python script.
> >
> >
> >  if __name__ == '__main__':
> >
> >     for filename in os.listdir("/user/hdmaster/XML2"):
> >     print filename
> >
> > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > task_201312020139_0025_m_000000
> > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> >
> > My intention is to take files one by one to parse.
> >
> > Any help or suggestion on this will be so much helpful to me
> >
> > Thanks
> > Haider
> >
>