Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Hadoop, mail # user - Copy Vs DistCP


Copy link to this message
-
Re: Copy Vs DistCP
Azuryy Yu 2013-04-11, 01:30
CP command is not parallel, It's just call FileSystem, even if DFSClient
has multi threads.

DistCp can work well on the same cluster.
On Thu, Apr 11, 2013 at 8:17 AM, KayVajj <[EMAIL PROTECTED]> wrote:

> The File System Copy utility copies files byte by byte if I'm not wrong.
> Could it be possible that the cp command works with blocks and moves them
> which could be significantly efficient?
>
>
> Also how does the cp command work if the file is distributed on different
> data nodes??
>
> Thanks
> Kay
>
>
> On Wed, Apr 10, 2013 at 4:48 PM, Jay Vyas <[EMAIL PROTECTED]> wrote:
>
>> DistCP is a full blown mapreduce job (mapper only, where the mappers do a
>> "fully" parallel copy to the detsination).
>>
>> CP appears (correct me if im wrong) to simply invoke the FileSystem and
>> issues a copy command for every source file.
>>
>> I have an additional question: how is CP which is internal to a cluster
>> optimized (if at all) ?
>>
>>
>>
>> On Wed, Apr 10, 2013 at 7:28 PM, 麦树荣 <[EMAIL PROTECTED]> wrote:
>>
>>> **
>>> Hi,
>>>
>>> I think it' better using Copy in the same cluster while using distCP
>>> between clusters, and cp command is a hadoop internal parallel process and
>>> will not copy files locally.
>>>
>>> ------------------------------
>>>  麦树荣
>>>
>>>  *From:* KayVajj <[EMAIL PROTECTED]>
>>> *Date:* 2013-04-11 06:20
>>> *To:* [EMAIL PROTECTED]
>>> *Subject:* Copy Vs DistCP
>>>       I have few questions regarding the usage of DistCP for copying
>>> files in the same cluster.
>>>
>>>
>>> 1) Which one is better within a  same cluster and what factors (like
>>> file size etc) wouldinfluence the usage of one over te other?
>>>
>>>  2) when we run a cp command like below from a  client node of the
>>> cluster (not a data node), How does the cp command work
>>>       i) like an MR job
>>>      ii) copy files locally and then it copy it back at the new location.
>>>
>>>  Example of the copy command
>>>
>>>  hdfs dfs -cp /<some_location>/file /<new_location>/
>>>
>>>  Thanks, your responses are appreciated.
>>>
>>>  -- Kay
>>>
>>
>>
>>
>> --
>> Jay Vyas
>> http://jayunit100.blogspot.com
>>
>
>