Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
MapReduce >> mail # user >> question about hdfs data loss risk


Copy link to this message
-
Re: question about hdfs data loss risk
Hi,

1) You may want to read about proper node decommissioning.
http://wiki.apache.org/hadoop/FAQ#I_want_to_make_a_large_cluster_smaller_by_taking_out_a_bunch_of_nodes_simultaneously._How_can_this_be_done.3F

2) NameNode will replicate blocks when they do not comply with their
replication factor.

3) NameNode does not give up.

4) Yes, ultimately, if you have a replication factor of n and the n
replicas are lost at the same time, well, the data is truly lost. But
that's not specific to Hadoop.

Bertrand
On Sun, Oct 27, 2013 at 7:42 PM, Koert Kuipers <[EMAIL PROTECTED]> wrote:

> i have a cluster with replication factor 2. wit the following events in
> this order, do i have data loss?
>
> 1) shut down a datanode for maintenance unrelated to hdfs. so now some
> blocks only have replication factor 1
>
> 2) a disk dies in another datanode. let's assume some blocks now have
> replication factor 0 since they were on this disk that died and on the
> datanode that is shut down for maintenance.
>
> 3) bring back up the datanode that was down for maintenance.
>
> what i am worried about is that the namenode gives up on a block with
> replication factor 0 after steps 1) and 2) and considers it lost, and by
> the time the replica will come back on again in step 3) the namenode no
> longer considers the block to be existent.
>
> thanks! koert
>
>
--
Bertrand Dechoux