Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # user >> Re: Copy Vs DistCP


Copy link to this message
-
Re: Copy Vs DistCP
For copying large files, I prefer distcp.
On Sun, Apr 14, 2013 at 11:31 PM, Ted Dunning <[EMAIL PROTECTED]> wrote:

>
>
>
> On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts <
> [EMAIL PROTECTED]> wrote:
>
>>
>> >
>> > This is absolutely true.  Distcp dominates cp for large copies.  On the
>> other hand cp dominates distcp for convenience.
>> >
>> > In my own experience, I love cp when copying relatively small amounts
>> of data (10's of GB) where the available bandwidth of about a GB/s allows
>> the copy to complete in less time that it takes distcp to get started.
>> >
>> > At larger sizes (100's of GB and up), the startup time of distcp
>> doesn't matter because once it gets going, it moves data much faster.
>>
>> Maybe we could put together a 'fs -smartcp' which choses wisely between
>> copy and distcp depending on file size
>>
>
> Uh... hmm...
>
> This is a good suggestion.  Obvious in fact.  In retrospect.
>
> I would also suggest that the new command be called "distcp".
>
>