Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
MapReduce >> mail # user >> ALL HDFS Blocks on the Same Machine if Replication factor = 1

Razen Al Harbi 2013-06-10, 13:36
Copy link to this message
Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1
It's normal.  The default placement strategy stores the first block on the same node for performance, then choses a second random node on another rack, then a third node on the same rack as the second node.  Using a replication factor of 1 is not advised if you value your data.  However, if you want a better distribution of blocks with 1 replica then consider using a non-DN host to upload your files.


On Jun 10, 2013, at 8:36 AM, Razen Al Harbi wrote:

> Hello,
> I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one. When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the machine where the Hadoop put command is invoked.
> For higher replication factor, I see the same behavior but the replicated blocks are stored randomly on all the other machines.
> Is this a normal behavior, if not what would be the cause?
> Thanks,
> Razen
Kai Voigt 2013-06-10, 13:47
Shahab Yunus 2013-06-10, 13:57