Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> Re: Copy Vs DistCP


Copy link to this message
-
Re: Copy Vs DistCP
On Sun, Apr 14, 2013 at 10:33 AM, Mathias Herberts <
[EMAIL PROTECTED]> wrote:

>
> >
> > This is absolutely true.  Distcp dominates cp for large copies.  On the
> other hand cp dominates distcp for convenience.
> >
> > In my own experience, I love cp when copying relatively small amounts of
> data (10's of GB) where the available bandwidth of about a GB/s allows the
> copy to complete in less time that it takes distcp to get started.
> >
> > At larger sizes (100's of GB and up), the startup time of distcp doesn't
> matter because once it gets going, it moves data much faster.
>
> Maybe we could put together a 'fs -smartcp' which choses wisely between
> copy and distcp depending on file size
>

Uh... hmm...

This is a good suggestion.  Obvious in fact.  In retrospect.

I would also suggest that the new command be called "distcp".