-Re: Heartbeat interval and timeout: why 3 secs and 10 min?
Colin McCabe 2013-03-13, 18:08
My understanding is that the 10 minute timeout helps to avoid replication
storms, especially during startup.
You might be interested in HDFS-3703, which adds a "stale" state which
datanodes are placed into after 30 seconds of missing heartbeats. (This is
an optional feature controlled by dfs.namenode.check.stale.datanode )
On Tue, Mar 12, 2013 at 5:29 PM, André Oriani <[EMAIL PROTECTED]> wrote:
> No take on this one?
> In Zookeeper the heartbeats happen on every third of the timeout. If I am
> not mistaken, recomended timeout is more than 2 minutes to avoid false
> But I still cannot see the relationship on HDFS between heartbeat interval
> and timeout. Okay 10 minutes seems to be a conservative value to avoid
> false positives in a big cluster. But that means 200 hearbeats. Heartbeats
> on HDFS are not only used for liveness detection but also to send
> information about free space and load and to receive commands from
> NameNode. So they are also essential for block placement decisions and for
> ensuring the replication levels. Would that then be reason why heartbeats
> are so frequent? A lot can happen to a DataNode in just three seconds?
> André Oriani
> On Thu, Mar 7, 2013 at 10:37 PM, André Oriani <[EMAIL PROTECTED]> wrote:
> > Hi,
> > Is there any particular reason why the default heartbeat interval is 3
> > seconds and the timeout is 10 minutes? Everywhere I looked (code, Google,
> > ..) only mentions the values but no clue on why those values were
> > Thanks in advance,
> > André Oriani