Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> ALL HDFS Blocks on the Same Machine if Replication factor = 1


Copy link to this message
-
Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1
Hello,

Am 10.06.2013 um 15:36 schrieb Razen Al Harbi <[EMAIL PROTECTED]>:

> I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one. When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the machine where the Hadoop put command is invoked.
>
> For higher replication factor, I see the same behavior but the replicated blocks are stored randomly on all the other machines.
>
> Is this a normal behavior, if not what would be the cause?

Yes, this is normal behavior. When a HDFS client happens to run on a host that also is a DataNode (always the case when a reducer writes its output), the first copy of a block is stored on that very same node. This is to optimize the latency, it's faster to write to a local disk than writing across the network.

The second copy of the block gets stored onto a random host in another rack (if your cluster is configured to be rack-aware), to increase the distribution of the data.

The third copy of the block gets stored onto a random host in that other rack.

So your observations are correct.

Kai

--
Kai Voigt
[EMAIL PROTECTED]