-Re: Heartbeat interval and timeout: why 3 secs and 10 min?
Suresh Srinivas 2013-03-13, 06:15
You are right, in heartbeat response namenode sends commands to the
datanode. Commands sent this way include deletion of blocks, replication,
block recovery secret key updates etc. Increasing the heartbeat interval
in namenode not being able to quickly act on the events in the cluster and
send commands to datanodes.
Declaring a datanode dead is done with a much conservative value of 10
minutes to avoid unnecessarily replicating the data, which will affect
both storage and cluster bandwidth.
On Tue, Mar 12, 2013 at 5:29 PM, André Oriani <[EMAIL PROTECTED]> wrote:
> No take on this one?
> In Zookeeper the heartbeats happen on every third of the timeout. If I am
> not mistaken, recomended timeout is more than 2 minutes to avoid false
> But I still cannot see the relationship on HDFS between heartbeat interval
> and timeout. Okay 10 minutes seems to be a conservative value to avoid
> false positives in a big cluster. But that means 200 hearbeats. Heartbeats
> on HDFS are not only used for liveness detection but also to send
> information about free space and load and to receive commands from
> NameNode. So they are also essential for block placement decisions and for
> ensuring the replication levels. Would that then be reason why heartbeats
> are so frequent? A lot can happen to a DataNode in just three seconds?
> André Oriani
> On Thu, Mar 7, 2013 at 10:37 PM, André Oriani <[EMAIL PROTECTED]> wrote:
> > Hi,
> > Is there any particular reason why the default heartbeat interval is 3
> > seconds and the timeout is 10 minutes? Everywhere I looked (code, Google,
> > ..) only mentions the values but no clue on why those values were
> > Thanks in advance,
> > André Oriani