Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop >> mail # user >> Copy Vs DistCP


Copy link to this message
-
Re: Copy Vs DistCP
DistCP is a full blown mapreduce job (mapper only, where the mappers do a
"fully" parallel copy to the detsination).

CP appears (correct me if im wrong) to simply invoke the FileSystem and
issues a copy command for every source file.

I have an additional question: how is CP which is internal to a cluster
optimized (if at all) ?
On Wed, Apr 10, 2013 at 6:20 PM, KayVajj <[EMAIL PROTECTED]> wrote:

> I have few questions regarding the usage of DistCP for copying files in
> the same cluster.
>
>
> 1) Which one is better within a  same cluster and what factors (like file
> size etc) wouldinfluence the usage of one over te other?
>
> 2) when we run a cp command like below from a  client node of the cluster
> (not a data node), How does the cp command work
>      i) like an MR job
>     ii) copy files locally and then it copy it back at the new location.
>
> Example of the copy command
>
> hdfs dfs -cp /<some_location>/file /<new_location>/
>
> Thanks, your responses are appreciated.
>
> -- Kay
>

--
Jay Vyas
http://jayunit100.blogspot.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB