Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> ALL HDFS Blocks on the Same Machine if Replication factor = 1


+
Razen Al Harbi 2013-06-10, 13:36
+
Daryn Sharp 2013-06-10, 13:53
Copy link to this message
-
Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1
Hello,

Am 10.06.2013 um 15:36 schrieb Razen Al Harbi <[EMAIL PROTECTED]>:

> I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one. When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the machine where the Hadoop put command is invoked.
>
> For higher replication factor, I see the same behavior but the replicated blocks are stored randomly on all the other machines.
>
> Is this a normal behavior, if not what would be the cause?

Yes, this is normal behavior. When a HDFS client happens to run on a host that also is a DataNode (always the case when a reducer writes its output), the first copy of a block is stored on that very same node. This is to optimize the latency, it's faster to write to a local disk than writing across the network.

The second copy of the block gets stored onto a random host in another rack (if your cluster is configured to be rack-aware), to increase the distribution of the data.

The third copy of the block gets stored onto a random host in that other rack.

So your observations are correct.

Kai

--
Kai Voigt
[EMAIL PROTECTED]
+
Shahab Yunus 2013-06-10, 13:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB