Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig >> mail # user >> listdir() python function is not wokring on hadoop


Copy link to this message
-
Re: listdir() python function is not wokring on hadoop
Hi All

    Thanks for you suggestions
But in my case I have thousands small files and I want read them one by
one.I think it is only possible by using listdir().
As per Nitin comment I tried to install Pydoop but it is throwing me some
strange error and I am not finding any inforamtion on pydoop on google.

thanks
Haider
On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
<[EMAIL PROTECTED]>wrote:

> Haider,
> You can use TextLoader to read a file in HDFS line by line, and then you
> can pass those lines to your python UDF. Something like the following
> should work:
>
> x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> y = foreach x generate my_udf(line);
>
> -----Original Message-----
> From: Haider [mailto:[EMAIL PROTECTED]]
> Sent: Thursday, December 5, 2013 10:12 PM
> To: [EMAIL PROTECTED]
> Subject: Re: listdir() python function is not wokring on hadoop
>
> I am trying to read from HDFS not from Local file system, so would it be
> possible through listdir? or is there any way to read hdfs files one by one
> and passing to one funtion.
>
>
>
>
> On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> <[EMAIL PROTECTED]>wrote:
>
> > I can call listdir to read from local filesystem in a python UDF. Did
> > you implement your function as a proper UDF?
> > ________________________________________
> > From: Haider [[EMAIL PROTECTED]]
> > Sent: Monday, December 02, 2013 5:22 AM
> > To: [EMAIL PROTECTED]
> > Subject: listdir() python function is not wokring on hadoop
> >
> > Hi all
> >
> >    is there any one who successfully used listdir() function to
> > retrieve files one by one from HDFS using python script.
> >
> >
> >  if __name__ == '__main__':
> >
> >     for filename in os.listdir("/user/hdmaster/XML2"):
> >     print filename
> >
> > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > task_201312020139_0025_m_000000
> > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> >
> > My intention is to take files one by one to parse.
> >
> > Any help or suggestion on this will be so much helpful to me
> >
> > Thanks
> > Haider
> >
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB