Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
HDFS >> mail # dev >> Datanode fencing mechanism


Copy link to this message
-
Re: Datanode fencing mechanism
Hi Liu Le,

You're correct, that's an oversight that was designed but never
implemented. It's quite a rare circumstance but we should probably
implement the persistent promise as you suggested. Want to have a try at
making a patch for trunk?

-Todd
On Mon, Oct 28, 2013 at 1:57 AM, lei liu <[EMAIL PROTECTED]> wrote:

> In https://issues.apache.org/jira/browse/HDFS-1972 jira, there is one
> below
> case:
> Scenario 3: DN restarts during split brain period
>
> (this scenario illustrates why I think we need to persistently record the
> promise about who is active)
>
>    - block has 2 replicas, user asks to reduce to 1
>    - NN1 adds the block to DN1's invalidation queue, but it's backed up
>    behind a bunch of other commands, so doesn't get issued yet.
>    - Failover occurs, but NN1 still thinks it's active.
>    - DN1 promises to NN2 not to accept commands from NN1. It sends an empty
>    deletion report to NN2. Then, it crashes.
>    - NN2 has received a deletion report from everyone, and asks DN2 to
>    delete the block. It hasn't realized that DN1 is crashed yet.
>    - DN2 deletes the block.
>
>
>    - DN1 starts back up. When it comes back up, it talks to NN1 first
>    (maybe it takes a while to connect to NN2 for some reason)
>       - ** Now, if we had saved the "promise" as part of persistent state,
>       we could ignore NN1 and avoid this issue. Otherwise:
>       - NN1 still thinks it's active, and sends a command to DN1 to delete
>       the block. DN1 does so.
>       - We lost the bloc
>
>
> I am use the CDH4.3.1 version, and am reading the DataNode code. I don't
> find the DataNode to save the "promise" as part of persistent state.   I
> want to know whether the case 3 is handled in CDH4.3.1 version.  If  the
> case is hadnled, where is the code?
>
>
> Thanks,
>
> LiuLe
>

--
Todd Lipcon
Software Engineer, Cloudera
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB