MapReduce, mail # user - ALL HDFS Blocks on the Same Machine if Replication factor = 1

Razen Al Harbi 2013-06-10, 13:36
Re: ALL HDFS Blocks on the Same Machine if Replication factor = 1
Daryn Sharp 2013-06-10, 13:53
It's normal.  The default placement strategy stores the first block on the same node for performance, then choses a second random node on another rack, then a third node on the same rack as the second node.  Using a replication factor of 1 is not advised if you value your data.  However, if you want a better distribution of blocks with 1 replica then consider using a non-DN host to upload your files.


On Jun 10, 2013, at 8:36 AM, Razen Al Harbi wrote:

> Hello,
> I have deployed Hadoop on a cluster of 20 machines. I set the replication factor to one. When I put a file (larger than HDFS block size) into HDFS, all the blocks are stored on the machine where the Hadoop put command is invoked.
> For higher replication factor, I see the same behavior but the replicated blocks are stored randomly on all the other machines.
> Is this a normal behavior, if not what would be the cause?
> Thanks,
> Razen
