On 11 July 2013 06:27, Hao Ren <[EMAIL PROTECTED]> wrote:
> I am running a hdfs on Amazon EC2
> Say, I have a ftp server where stores some data.
I just want to copy these data directly to hdfs in a parallel way (which
> maybe more efficient).
> I think hadoop distcp is what I need.
DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution, error handling and
recovery, and reporting
I doubt this is going to help. Are these lot of files. If yes, how about
multiple copy jobs to hdfs?