Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Zookeeper delay  to reconnect


+
Sergei Babovich 2012-09-27, 20:34
Copy link to this message
-
Re: Zookeeper delay to reconnect
Michi Mutsuzaki 2012-09-27, 21:28
Hi Sergei,

Your suggestion sounds reasonable to me. I think the sleep was added
so that the client doesn't spin when the entire zookeeper is down. The
client could try to connect to each server without sleep, and sleep
for 1 second only after failing to connect to all the servers in the
cluster.

Thanks!
--Michi

On Thu, Sep 27, 2012 at 1:34 PM, Sergei Babovich
<[EMAIL PROTECTED]> wrote:
> Hi,
> Zookeeper implements a delay of up to 1 second before trying to reconnect.
>
> ClientCnxn$SendThread
>         @Override
>         public void run() {
>             ...
>             while (state.isAlive()) {
>                 try {
>                     if (!clientCnxnSocket.isConnected()) {
>                         if(!isFirstConnect){
>                             try {
>                                 Thread.sleep(r.nextInt(1000));
>                             } catch (InterruptedException e) {
>                                 LOG.warn("Unexpected exception", e);
>                             }
>
> This creates "outages" (even with simple retry on ConnectionLoss) up to 1s
> even with perfectly healthy cluster like in scenario of rolling restart. In
> our scenario it might be a problem under high load creating a spike in a
> number of requests waiting on zk operation.
> Would it be a better strategy to perform reconnect attempt immediately at
> least one time? Or there is more to it?
+
Ben Bangert 2012-09-28, 16:34
+
Patrick Hunt 2012-09-27, 23:55
+
Sergei Babovich 2012-09-28, 15:15
+
Brian Tarbox 2012-09-27, 23:58
+
Patrick Hunt 2012-09-28, 00:07
+
Brian Tarbox 2012-09-28, 00:21