Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - zookeeper client retry logic..


+
Rakesh R 2012-01-02, 07:08
+
Camille Fournier 2012-01-03, 00:21
+
Rakesh R 2012-01-03, 05:36
Copy link to this message
-
Re: zookeeper client retry logic..
Benjamin Reed 2012-01-03, 18:04
this is a good observation. one problem with this reasoning is that
the timeout should really be based on expected latencies which is
independent of the number of servers. one problem you may run into
with this logic is if you have a bunch of servers, like 9, your
timeouts may get too small and you will get false timeouts.

to correct the problem you are pointing out, it seems like it would be
better to do try reconnections in parallel rather than try serially.
of course this would be much harder to implement.

also, the 1/2, 2/3 timeout rules. are quite arbitrary. they seem
reasonable, but there wasn't deep thought put into them. it would be
nice perhaps to make the thresholds configurable or come up with a
clever mechanism to figure out the right timeouts.

ben

On Mon, Jan 2, 2012 at 9:36 PM, Rakesh R <[EMAIL PROTECTED]> wrote:
> Hi Camille,
>
> Thanks for the reply. Could you please go through the following cases:
>
> Case-1:-
> Say, I have 5 servers zk1,zk2,zk3,zk4,zk5 and configured sessionTimeOut=60secs
> readTimeOut = 60 * 2 / 3 and is 40secs
> connectionTimeOut = 60/servers.length = 60/5 = 12secs
>
> step1: Say, client has established connection with zk1.
> step2: Shutdown zk1 and zk2. Since readTimeOut is 40s, will take 40s for first retrayal to the next server.
> step3: Say, client retry's to zk2, will take max 12s for connectionTimeOut. Now client session has elapsed total 52secs, only left out time for session expiration is 8secs.
>
> Retryal intervals as follows >>  40s, 12s, 8s
>
> Case-2:-
> Also consider 'R-O server' feature, started 5 servers with R-O mode and configured sessionTimeOut=60secs
> step1: Client has established connection with zk1.
> step2: Shutdown zk1,zk2,zk3
> step3: Client has elapsed 40s for readTimeOut, then 12s, then 8s and the client session will be expired before the next retryal. But Zk4 and ZK5 are running in R-O mode and able to retain the client session.
>
> Say, if we consider 'servers.length' so can improve the retryals or shall we think of a better formula?
> (Note:- Evenafter considering server.length, I feel still there is a small gap, not retrying to the fifth server)
>
> readTimeOut = 60 * 2 / 5(servers.len) and is 24secs.
> Retryal intervals as follows >> 24s, 12s, 12s, 12s
>
> IMO, presently 'readTimeOut' is not calculated based on the quorum strength, but it would be good to have a shorter timeout for more fair retryals.
>
> Thanks,
> Rakesh
>
>
> ________________________________________
> From: Camille Fournier [[EMAIL PROTECTED]]
> Sent: Tuesday, January 03, 2012 5:51 AM
> To: [EMAIL PROTECTED]
> Subject: Re: zookeeper client retry logic..
>
> It's an interesting idea... can you explain more why you think it
> would be good to have a shorter timeout in the case of a longer list
> of servers?
>
> Thanks,
> C
>
> On Mon, Jan 2, 2012 at 2:08 AM, Rakesh R <[EMAIL PROTECTED]> wrote:
>> Hi everyone,
>>
>>
>>
>> In ClientCnxn, 'readTimeOut' is calculated as follows:
>>
>>    readTimeOut = sessionTimeOut * 2 / 3; // here it is not considering the server list. If the server list grows more than 3, it will not giving a fair chance to retry to all the servers(in worst case).
>>
>>
>>
>> Can we think of changing the 'readTimeOut logic' by using the serverslist.length instead of constant/magic number '3'.
>>
>>
>>
>> For example:-
>>
>> I have 5 servers and client sessionTimeOut=120secs
>>
>>
>>
>> readTimeOut = 120 * 2 / 3 and is 80secs
>>
>>
>>
>> In this case, the it takes 80secs for the first timeout if the connected server is not responding. This is large time, if we consdier the serverlist, it can retry to next server immediately in <50secs.
>>
>>
>>
>>
>>
>> Thanks & Regards,
>>
>> Rakesh
>>
>>
>>
>>
+
Patrick Hunt 2012-01-03, 18:55
+
Thomas Koch 2012-01-03, 14:41