Bill Q 2013-03-07, 04:21
-Re: HDFS network traffic
Harsh J 2013-03-07, 05:23
Yes, the simple copy is a client operation. Client reads bytes from
source and writes to the destination, thereby being in control of
failures, etc.. However, if you want your cluster to do the copy (and
if the copy is a big set), consider using the DistCp
(distributed-copy) MR job to do it.
On Thu, Mar 7, 2013 at 9:51 AM, Bill Q <[EMAIL PROTECTED]> wrote:
> Hi All,
> I am working on converting a sequence file to mapfile and just discovered
> something I wasn't aware of.
> For example, suppose I am working on a 2-node cluster, one
> master/namenode/datanode, one slave/datanode. If I do hadoop dfs -cp
> /data/file1 /data/file2 (a 1G file) from the master, and monitor the NIC of
> both nodes, I saw that the master node send the entire file of 1G traffic to
> the slave. This surprised me. Does this mean all the traffic has to go
> through the client node that runs the command (in this case, the master)
> when I do hadoop dfs -cp?
> Many thanks.