Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Copy Vs DistCP


Copy link to this message
-
Re: Copy Vs DistCP
For copying large files, I prefer distcp.
On Sun, Apr 14, 2013 at 11:31 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
>
>
> On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts <
> [EMAIL PROTECTED]> wrote:
>
>>
>> >
>> > This is absolutely true.  Distcp dominates cp for large copies.  On the
>> other hand cp dominates distcp for convenience.
>> >
>> > In my own experience, I love cp when copying relatively small amounts
>> of data (10's of GB) where the available bandwidth of about a GB/s allows
>> the copy to complete in less time that it takes distcp to get started.
>> >
>> > At larger sizes (100's of GB and up), the startup time of distcp
>> doesn't matter because once it gets going, it moves data much faster.
>>
>> Maybe we could put together a 'fs -smartcp' which choses wisely between
>> copy and distcp depending on file size
>>
>
> Uh... hmm...
>
> This is a good suggestion.  Obvious in fact.  In retrospect.
>
> I would also suggest that the new command be called "distcp".
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB