Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - listdir() python function is not wokring on hadoop


Copy link to this message
-
Re: listdir() python function is not wokring on hadoop
Haider 2013-12-07, 07:19
python setup.py build-->giving error
Packaging Java classes sh: 1: jar: not found error: Error packaging java
component. Command: jar -cf
build/lib.linux-i686-2.7/pydoop/pydoop_1_1_2.jar -C
build/temp.linux-i686-2.7/pipes-1.1.2 ./it

On Sat, Dec 7, 2013 at 12:00 PM, Nitin Pawar <[EMAIL PROTECTED]>wrote:

> Can you share the error?
> On Dec 7, 2013 8:49 AM, "Haider" <[EMAIL PROTECTED]> wrote:
>
> > Hi All
> >
> >     Thanks for you suggestions
> > But in my case I have thousands small files and I want read them one by
> > one.I think it is only possible by using listdir().
> > As per Nitin comment I tried to install Pydoop but it is throwing me some
> > strange error and I am not finding any inforamtion on pydoop on google.
> >
> > thanks
> > Haider
> >
> >
> >
> >
> > On Sat, Dec 7, 2013 at 8:19 AM, Yigitbasi, Nezih
> > <[EMAIL PROTECTED]>wrote:
> >
> > > Haider,
> > > You can use TextLoader to read a file in HDFS line by line, and then
> you
> > > can pass those lines to your python UDF. Something like the following
> > > should work:
> > >
> > > x = load '/tmp/my_file_on_hdfs' using TextLoader() as (line:chararray);
> > > y = foreach x generate my_udf(line);
> > >
> > > -----Original Message-----
> > > From: Haider [mailto:[EMAIL PROTECTED]]
> > > Sent: Thursday, December 5, 2013 10:12 PM
> > > To: [EMAIL PROTECTED]
> > > Subject: Re: listdir() python function is not wokring on hadoop
> > >
> > > I am trying to read from HDFS not from Local file system, so would it
> be
> > > possible through listdir? or is there any way to read hdfs files one by
> > one
> > > and passing to one funtion.
> > >
> > >
> > >
> > >
> > > On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
> > > <[EMAIL PROTECTED]>wrote:
> > >
> > > > I can call listdir to read from local filesystem in a python UDF. Did
> > > > you implement your function as a proper UDF?
> > > > ________________________________________
> > > > From: Haider [[EMAIL PROTECTED]]
> > > > Sent: Monday, December 02, 2013 5:22 AM
> > > > To: [EMAIL PROTECTED]
> > > > Subject: listdir() python function is not wokring on hadoop
> > > >
> > > > Hi all
> > > >
> > > >    is there any one who successfully used listdir() function to
> > > > retrieve files one by one from HDFS using python script.
> > > >
> > > >
> > > >  if __name__ == '__main__':
> > > >
> > > >     for filename in os.listdir("/user/hdmaster/XML2"):
> > > >     print filename
> > > >
> > > > ERROR streaming.StreamJob: Job not successful. Error: # of failed Map
> > > > Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask:
> > > > task_201312020139_0025_m_000000
> > > > 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
> > > >
> > > > My intention is to take files one by one to parse.
> > > >
> > > > Any help or suggestion on this will be so much helpful to me
> > > >
> > > > Thanks
> > > > Haider
> > > >
> > >
> >
>