Thanks for you suggestions
But in my case I have thousands small files and I want read them one by
one.I think it is only possible by using listdir().
As per Nitin comment I tried to install Pydoop but it is throwing me some
strange error and I am not finding any inforamtion on pydoop on google.
On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
> You can use TextLoader to read a file in HDFS line by line, and then you
> can pass those lines to your python UDF. Something like the following
> should work:
> x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> y = foreach x generate my_udf(line);
> -----Original Message-----
> From: Haider [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, December 5, 2013 10:12 PM
> To: [EMAIL PROTECTED]
> Subject: Re: listdir() python function is not wokring on hadoop
> I am trying to read from HDFS not from Local file system, so would it be
> possible through listdir? or is there any way to read hdfs files one by one
> and passing to one funtion.
> On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> <[EMAIL PROTECTED]>wrote:
> > I can call listdir to read from local filesystem in a python UDF. Did
> > you implement your function as a proper UDF?
> > ________________________________________
> > From: Haider [[EMAIL PROTECTED]]
> > Sent: Monday, December 02, 2013 5:22 AM
> > To: [EMAIL PROTECTED]
> > Subject: listdir() python function is not wokring on hadoop
> > Hi all
> > is there any one who successfully used listdir() function to
> > retrieve files one by one from HDFS using python script.
> > if __name__ == '__main__':
> > for filename in os.listdir("/user/hdmaster/XML2"):
> > print filename
> > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > task_201312020139_0025_m_000000
> > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> > My intention is to take files one by one to parse.
> > Any help or suggestion on this will be so much helpful to me
> > Thanks
> > Haider