Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> adding a separate thread to detect network timeouts faster


Copy link to this message
-
Re: adding a separate thread to detect network timeouts faster
Hello Jeremy and all,

my idea was that the current implementation of ping handling already does
not wait on disk IO.
I am even working in a JIRA case that is related with this:
https://issues.apache.org/jira/browse/ZOOKEEPER-87
And I have also made some tests that seem to confirm that ping handling is
done in a different thread than transaction handling.
But actually, I don't have any confirmation from any person in this
project. Are you sure that ping handling waits on IO for anything? Have you
tested it?

Regards,
Germán Blanco.

On Tue, Sep 10, 2013 at 11:05 PM, Jeremy Stribling <[EMAIL PROTECTED]> wrote:

> Good suggestion, thanks.  At the very least, I think what we have in mind
> would be off by default, so users could only turn it on if they know they
> have relatively few clients and slow disks.  An adaptive scheme would be
> even better, obviously.
>
>
> On 09/10/2013 02:04 PM, Ted Dunning wrote:
>
>>
>> Perhaps you should be suggesting a design that is adaptive rather than
>> configured and guarantees low overhead at the cost of notification time in
>> extreme scenarios.
>>
>> For instance, the server can send no more than 1000 (or whatever number)
>> HB's per second and never more than one per second to any client.  This
>> caps the cost nicely.
>>
>>
>>
>> On Tue, Sep 10, 2013 at 1:59 PM, Ted Dunning <[EMAIL PROTECTED]<mailto:
>> [EMAIL PROTECTED]>**> wrote:
>>
>>
>>     Since you are talking about client connection failure detection,
>>     no, I don't think that there is a major barrier other than
>>     actually implementing a reliable check.
>>
>>     Keep in mind the cost.  There are ZK installs with 100,000
>>     clients.  If these are heartbeating every 2 seconds, you have
>>     50,000 packets per second hitting the quorum or 10,000 per server
>>     if all connections are well balanced.
>>
>>     If you only have 10 clients, the network burden is nominal.
>>
>>
>>
>>     On Tue, Sep 10, 2013 at 1:34 PM, Jeremy Stribling
>>     <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>         I mostly agree, but let's assume that a ~5x speedup in
>>         detecting those types of failures is considered significant
>>         for some people. Are there technical reasons that would
>>         prevent this idea from working?
>>
>>         On 09/10/2013 01:31 PM, Ted Dunning wrote:
>>
>>             I don't see the strong value here.  A few failures would
>>             be detected more
>>             quickly, but I am not convinced that this would actually
>>             improve
>>             functionality significantly.
>>
>>
>>             On Tue, Sep 10, 2013 at 1:01 PM, Jeremy Stribling
>>             <[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]>> wrote:
>>
>>                 Hi all,
>>
>>                 Let's assume that you wanted to deploy ZK in a
>>                 virtualized environment,
>>                 despite all of the known drawbacks.  Assume we could
>>                 deploy it such that
>>                 the ZK servers were all using independent CPUs and
>>                 storage (though not
>>                 dedicated disks).  Obviously, the shared disks (shared
>>                 with other, non-ZK
>>                 VMs on the same hypervisor) will cause ZK to hit the
>>                 default session
>>                 timeout occasionally, so you would need to raise the
>>                 existing session
>>                 timeout to something like 30 seconds.
>>
>>                 I'm curious if there would be any technical drawbacks
>>                 to adding an
>>                 additional heartbeat mechanism between the clients and
>>                 the servers, which
>>                 would have the goal of detecting network-only failures
>>                 faster than the
>>                 existing heartbeat mechanism.  The idea is that there
>>                 would be a new thread
>>                 dedicated to processing these heartbeats, which would
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB