Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> adding a separate thread to detect network timeouts faster


+
Jeremy Stribling 2013-09-10, 20:01
Copy link to this message
-
Re: adding a separate thread to detect network timeouts faster
I don't see the strong value here.  A few failures would be detected more
quickly, but I am not convinced that this would actually improve
functionality significantly.
On Tue, Sep 10, 2013 at 1:01 PM, Jeremy Stribling <[EMAIL PROTECTED]> wrote:

> Hi all,
>
> Let's assume that you wanted to deploy ZK in a virtualized environment,
> despite all of the known drawbacks.  Assume we could deploy it such that
> the ZK servers were all using independent CPUs and storage (though not
> dedicated disks).  Obviously, the shared disks (shared with other, non-ZK
> VMs on the same hypervisor) will cause ZK to hit the default session
> timeout occasionally, so you would need to raise the existing session
> timeout to something like 30 seconds.
>
> I'm curious if there would be any technical drawbacks to adding an
> additional heartbeat mechanism between the clients and the servers, which
> would have the goal of detecting network-only failures faster than the
> existing heartbeat mechanism.  The idea is that there would be a new thread
> dedicated to processing these heartbeats, which would not get blocked on
> I/O.  Then the clients could configure a second, smaller timeout value, and
> it would be assumed that any such timeout indicated a real problem.  The
> existing mechanism would still be in place to catch I/O-related errors.
>
> I understand the philosophy that there should be some heartbeat mechanism
> that takes the disk into account, but I'm having trouble coming up with
> technical reasons not to add a second mechanism. Obviously, the advantage
> would be that the clients could detect network failures and system crashes
> more quickly in an environment with slow disks, and fail over to other
> servers more quickly.  The only disadvantages I can come up with are:
>
> 1) More code complexity, and slightly more heartbeat traffic on the wire
> 2) I think the servers have to log session expirations to disk, so if the
> sessions expire at a faster rate than the disk can handle, it might lead to
> a large backlog.
>
> Are there other drawbacks I am missing?  Would a patch that added
> something like this be considered, or is it dead from the start? Thanks,
>
> Jeremy
>
>
+
Jeremy Stribling 2013-09-10, 20:34
+
mattdaumen@... 2013-09-10, 20:45
+
Jeremy Stribling 2013-09-10, 20:48
+
Ted Dunning 2013-09-10, 20:59
+
Ted Dunning 2013-09-10, 21:04
+
Jeremy Stribling 2013-09-10, 21:05
+
German Blanco 2013-09-11, 05:40
+
Jeremy Stribling 2013-09-11, 06:32
+
Michi Mutsuzaki 2013-09-11, 20:36
+
Rakesh R 2013-09-12, 07:05
+
Michi Mutsuzaki 2013-09-12, 18:05
+
Rakesh R 2013-09-13, 06:24
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB