Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS, mail # dev - Why failed to use Distcp over FTP protocol?


Copy link to this message
-
Re: Why failed to use Distcp over FTP protocol?
sam liu 2013-04-24, 02:34
If I execute 'hadoop distcp hdfs:///tmp/test1.txt
ftp://ftpuser:ftpuser@hostname/tmp/', the exception will be:
attempt_201304222240_0006_m_000000_1: log4j:ERROR Could not connect to
remote log4j server at [localhost]. We will try again later.
13/04/23 19:31:33 INFO mapred.JobClient: Task Id :
attempt_201304222240_0006_m_000000_2, Status : FAILED
java.io.IOException: Cannot rename parent(source):
ftp://ftpuser:ftpuser@hostname/tmp/_distcp_logs_o6gzfy/_temporary/_attempt_201304222240_0006_m_000000_2,
parent(destination):
ftp://ftpuser:[EMAIL PROTECTED]/tmp/_distcp_logs_o6gzfy
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:547)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:512)
        at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:154)
        at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172)
        at
org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132)
        at
org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:221)
        at org.apache.hadoop.mapred.Task.commit(Task.java:1019)
        at org.apache.hadoop.mapred.Task.done(Task.java:889)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:373)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at
java.security.AccessController.doPrivileged(AccessController.java:310)
        at javax.security.auth.Subject.doAs(Subject.java:573)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

2013/4/24 sam liu <[EMAIL PROTECTED]>

> Now,  I can successfully run "hadoop distcp ftp://ftpuser:ftpuser@hostname/tmp/test1.txt
> hdfs:///tmp/test1.txt"
>
> But failed on "hadoop distcp hdfs:///tmp/test1.txt
> ftp://ftpuser:ftpuser@hostname/tmp/test1.txt.v1", it returns issue like:
> attempt_201304222240_0005_m_000000_1: log4j:ERROR Could not connect to
> remote log4j server at [localhost]. We will try again later.
> 13/04/23 18:59:05 INFO mapred.JobClient: Task Id :
> attempt_201304222240_0005_m_000000_2, Status : FAILED
> java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
>         at
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at
> java.security.AccessController.doPrivileged(AccessController.java:310)
>         at javax.security.auth.Subject.doAs(Subject.java:573)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> 2013/4/24 sam liu <[EMAIL PROTECTED]>
>
>> I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
>> it returns the root path of linux system.
>>
>> But failed to execute "hadoop fs -rm
>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
>> rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>> @ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>>
>>
>> 2013/4/24 Daryn Sharp <[EMAIL PROTECTED]>
>>
>>>  The ftp fs is listing the contents of the given path's parent
>>> directory, and then trying to match the basename of each child path
>>> returned against the basename of the given path – quite inefficient…  The
>>> FNF is it didn't find a match for the basename.  It may be that the ftp
>>> server isn't returning a listing in exactly the expected format so it's
>>> being parsed incorrectly.
>>>
>>>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"