Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - Why failed to use Distcp over FTP protocol?


Copy link to this message
-
Re: Why failed to use Distcp over FTP protocol?
sam liu 2013-04-25, 02:37
I could execute:
- hadoop fs -ls ftp://ftpuser:ftpuser@hostname/tmp/testdir
- hadoop fs -lsr ftp://ftpuser:ftpuser@hostname/tmp/testdir

Is there any special requirement to ftp configurations for running distcp
tool? In my env, if issue 'hadoop fs -lsr ftp://ftpuser:ftpuser@hostname',
it will return the root path of my linux file system.
2013/4/24 Daryn Sharp <[EMAIL PROTECTED]>

>  Listing the root is a bit of a special case that is different than N-many
> directories deep.  Can you list
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir/file or
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/dir?  I suspect ftp fs has a
> bug, so they will fail too.
>
>  On Apr 23, 2013, at 8:03 PM, sam liu wrote:
>
>  I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
> it returns the root path of linux system.
>
> But failed to execute "hadoop fs -rm
> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
> rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
> @ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>
>
> 2013/4/24 Daryn Sharp <[EMAIL PROTECTED]>
>
>> The ftp fs is listing the contents of the given path's parent directory,
>> and then trying to match the basename of each child path returned against
>> the basename of the given path – quite inefficient…  The FNF is it didn't
>> find a match for the basename.  It may be that the ftp server isn't
>> returning a listing in exactly the expected format so it's being parsed
>> incorrectly.
>>
>>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"
>> work?  Or "hadoop fs -rm
>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"?  Those cmds should
>> exercise the same code paths where you are experiencing errors.
>>
>>  Daryn
>>
>>  On Apr 22, 2013, at 9:06 PM, sam liu wrote:
>>
>>  I encountered IOException and FileNotFoundException:
>>
>> 13/04/17 17:11:10 INFO mapred.JobClient: Task Id :
>> attempt_201304160910_2135_m_
>> 000000_0, Status : FAILED
>> java.io.IOException: The temporary job-output directory
>> ftp://hadoopadm:xxxxxxxx@ftphostname/tmp/_distcp_logs_i74spu/_temporarydoesn't exist!
>>     at
>> org.apache.hadoop.mapred.FileOutputCommitter.getWorkPath(FileOutputCommitter.java:250)
>>     at
>> org.apache.hadoop.mapred.FileOutputFormat.getTaskOutputPath(FileOutputFormat.java:244)
>>     at
>> org.apache.hadoop.mapred.TextOutputFormat.getRecordWriter(TextOutputFormat.java:116)
>>     at
>> org.apache.hadoop.mapred.MapTask$DirectMapOutputCollector.<init>(MapTask.java:820)
>>     at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:430)
>>     at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
>>     at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>>     at
>> java.security.AccessController.doPrivileged(AccessController.java:310)
>>     at javax.security.auth.Subject.doAs(Subject.java:573)
>>     at
>> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1144)
>>     at org.apache.hadoop.mapred.Child.main(Child.java:249)
>>
>>
>> ... ...
>>
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job complete:
>> job_201304160910_2135
>> 13/04/17 17:11:42 INFO mapred.JobClient: Counters: 6
>> 13/04/17 17:11:42 INFO mapred.JobClient:   Job Counters
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Failed map tasks=1
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_MAPS=33785
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Launched map tasks=4
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all
>> reduces waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     Total time spent by all maps
>> waiting after reserving slots (ms)=0
>> 13/04/17 17:11:42 INFO mapred.JobClient:     SLOTS_MILLIS_REDUCES=6436
>> 13/04/17 17:11:42 INFO mapred.JobClient: Job Failed: # of failed Map
>> Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: