Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> Why failed to use Distcp over FTP protocol?


Copy link to this message
-
Re: Why failed to use Distcp over FTP protocol?
If I execute 'hadoop distcp hdfs:///tmp/test1.txt
ftp://ftpuser:ftpuser@hostname/tmp/', the exception will be:
attempt_201304222240_0006_m_000000_1: log4j:ERROR Could not connect to
remote log4j server at [localhost]. We will try again later.
13/04/23 19:31:33 INFO mapred.JobClient: Task Id :
attempt_201304222240_0006_m_000000_2, Status : FAILED
java.io.IOException: Cannot rename parent(source):
ftp://ftpuser:ftpuser@hostname/tmp/_distcp_logs_o6gzfy/_temporary/_attempt_201304222240_0006_m_000000_2,
parent(destination):
ftp://ftpuser:[EMAIL PROTECTED]/tmp/_distcp_logs_o6gzfy
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:547)
        at
org.apache.hadoop.fs.ftp.FTPFileSystem.rename(FTPFileSystem.java:512)
        at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:154)
        at
org.apache.hadoop.mapred.FileOutputCommitter.moveTaskOutputs(FileOutputCommitter.java:172)
        at
org.apache.hadoop.mapred.FileOutputCommitter.commitTask(FileOutputCommitter.java:132)
        at
org.apache.hadoop.mapred.OutputCommitter.commitTask(OutputCommitter.java:221)
        at org.apache.hadoop.mapred.Task.commit(Task.java:1019)
        at org.apache.hadoop.mapred.Task.done(Task.java:889)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:373)
        at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
        at
java.security.AccessController.doPrivileged(AccessController.java:310)
        at javax.security.auth.Subject.doAs(Subject.java:573)
        at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
        at org.apache.hadoop.mapred.Child.main(Child.java:249)

2013/4/24 sam liu <[EMAIL PROTECTED]>

> Now,  I can successfully run "hadoop distcp ftp://ftpuser:ftpuser@hostname/tmp/test1.txt
> hdfs:///tmp/test1.txt"
>
> But failed on "hadoop distcp hdfs:///tmp/test1.txt
> ftp://ftpuser:ftpuser@hostname/tmp/test1.txt.v1", it returns issue like:
> attempt_201304222240_0005_m_000000_1: log4j:ERROR Could not connect to
> remote log4j server at [localhost]. We will try again later.
> 13/04/23 18:59:05 INFO mapred.JobClient: Task Id :
> attempt_201304222240_0005_m_000000_2, Status : FAILED
> java.io.IOException: Copied: 0 Skipped: 0 Failed: 1
>         at
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:582)
>         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:435)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:371)
>
>         at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
>         at
> java.security.AccessController.doPrivileged(AccessController.java:310)
>         at javax.security.auth.Subject.doAs(Subject.java:573)
>         at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
>         at org.apache.hadoop.mapred.Child.main(Child.java:249)
>
>
> 2013/4/24 sam liu <[EMAIL PROTECTED]>
>
>> I can success execute "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>",
>> it returns the root path of linux system.
>>
>> But failed to execute "hadoop fs -rm
>> ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here", and it returns:
>> rm: Delete failed ftp://hadoopadm:xxxxxxxx<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>> @ftphostname/some/path/here<ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here>
>>
>>
>> 2013/4/24 Daryn Sharp <[EMAIL PROTECTED]>
>>
>>>  The ftp fs is listing the contents of the given path's parent
>>> directory, and then trying to match the basename of each child path
>>> returned against the basename of the given path – quite inefficient…  The
>>> FNF is it didn't find a match for the basename.  It may be that the ftp
>>> server isn't returning a listing in exactly the expected format so it's
>>> being parsed incorrectly.
>>>
>>>  Does "hadoop fs -ls ftp://hadoopadm:xxxxxxxx@ftphostname/some/path/here"
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB