Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Pig, mail # user - listdir() python function is not wokring on hadoop


Copy link to this message
-
Re: listdir() python function is not wokring on hadoop
Haider 2013-12-06, 06:12
I am trying to read from HDFS not from Local file system, so would it be
possible through listdir? or is there any way to read hdfs files one by one
and passing to one funtion.
On Fri, Dec 6, 2013 at 4:20 AM, Yigitbasi, Nezih
<[EMAIL PROTECTED]>wrote:

> I can call listdir to read from local filesystem in a python UDF. Did you
> implement your function as a proper UDF?
> ________________________________________
> From: Haider [[EMAIL PROTECTED]]
> Sent: Monday, December 02, 2013 5:22 AM
> To: [EMAIL PROTECTED]
> Subject: listdir() python function is not wokring on hadoop
>
> Hi all
>
>    is there any one who successfully used listdir() function to retrieve
> files one by one from HDFS using python script.
>
>
>  if __name__ == '__main__':
>
>     for filename in os.listdir("/user/hdmaster/XML2"):
>     print filename
>
> ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks
> exceeded allowed limit. FailedCount: 1. LastFailedTask:
> task_201312020139_0025_m_000000
> 13/12/02 05:20:50 INFO streaming.StreamJob: killJob...
>
> My intention is to take files one by one to parse.
>
> Any help or suggestion on this will be so much helpful to me
>
> Thanks
> Haider
>