Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> ALL HDFS Blocks on the Same Machine if Replication factor = 1


+
Razen Al Harbi 2013-06-10, 13:36
Copy link to this message
-
Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1
It's normal.  The default placement strategy stores the first block on the same node for performance, then choses a second random node on another rack, then a third node on the same rack as the second node.  Using a replication factor of 1 is not advised if you value your data.  However, if you want a better distribution of blocks with 1 replica then consider using a non-DN host to upload your files.

Daryn

On Jun 10, 2013, at 8:36 AM, Razen Al Harbi wrote:

> Hello,
>
> I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one. When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the machine where the Hadoop put command is invoked.
>
> For higher replication factor, I see the same behavior but the replicated blocks are stored randomly on all the other machines.
>
> Is this a normal behavior, if not what would be the cause?
>
> Thanks,
>
> Razen
+
Kai Voigt 2013-06-10, 13:47
+
Shahab Yunus 2013-06-10, 13:57
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB