Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: copy files from ftp to hdfs in parallel, distcp failed


Copy link to this message
-
Re: copy files from ftp to hdfs in parallel, distcp failed
On 11 July 2013 06:27, Hao Ren <[EMAIL PROTECTED]> wrote:

> Hi,
>
> I am running a hdfs on Amazon EC2
>
> Say, I have a ftp server where stores some data.
>

I just want to copy these data directly to hdfs in a parallel way (which
> maybe more efficient).
>
> I think hadoop distcp is what I need.
>

http://hadoop.apache.org/docs/stable/distcp.html

DistCp (distributed copy) is a tool used for large inter/intra-cluster
copying. It uses MapReduce to effect its distribution, error handling and
recovery, and reporting
I doubt this is going to help. Are these lot of files. If yes, how about
multiple copy jobs to hdfs?
-balaji