Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> zookeeper client retry logic..


Copy link to this message
-
RE: zookeeper client retry logic..
Hi Camille,

Thanks for the reply. Could you please go through the following cases:

Case-1:-
Say, I have 5 servers zk1,zk2,zk3,zk4,zk5 and configured sessionTimeOut=60secs
readTimeOut = 60 * 2 / 3 and is 40secs
connectionTimeOut = 60/servers.length = 60/5 = 12secs
 
step1: Say, client has established connection with zk1.
step2: Shutdown zk1 and zk2. Since readTimeOut is 40s, will take 40s for first retrayal to the next server.
step3: Say, client retry's to zk2, will take max 12s for connectionTimeOut. Now client session has elapsed total 52secs, only left out time for session expiration is 8secs.

Retryal intervals as follows >>  40s, 12s, 8s

Case-2:-
Also consider 'R-O server' feature, started 5 servers with R-O mode and configured sessionTimeOut=60secs
step1: Client has established connection with zk1.
step2: Shutdown zk1,zk2,zk3
step3: Client has elapsed 40s for readTimeOut, then 12s, then 8s and the client session will be expired before the next retryal. But Zk4 and ZK5 are running in R-O mode and able to retain the client session.

Say, if we consider 'servers.length' so can improve the retryals or shall we think of a better formula?
(Note:- Evenafter considering server.length, I feel still there is a small gap, not retrying to the fifth server)

readTimeOut = 60 * 2 / 5(servers.len) and is 24secs.
Retryal intervals as follows >> 24s, 12s, 12s, 12s

IMO, presently 'readTimeOut' is not calculated based on the quorum strength, but it would be good to have a shorter timeout for more fair retryals.

Thanks,
Rakesh
________________________________________
From: Camille Fournier [[EMAIL PROTECTED]]
Sent: Tuesday, January 03, 2012 5:51 AM
To: [EMAIL PROTECTED]
Subject: Re: zookeeper client retry logic..

It's an interesting idea... can you explain more why you think it
would be good to have a shorter timeout in the case of a longer list
of servers?

Thanks,
C

On Mon, Jan 2, 2012 at 2:08 AM, Rakesh R <[EMAIL PROTECTED]> wrote:
> Hi everyone,
>
>
>
> In ClientCnxn, 'readTimeOut' is calculated as follows:
>
>    readTimeOut = sessionTimeOut * 2 / 3; // here it is not considering the server list. If the server list grows more than 3, it will not giving a fair chance to retry to all the servers(in worst case).
>
>
>
> Can we think of changing the 'readTimeOut logic' by using the serverslist.length instead of constant/magic number '3'.
>
>
>
> For example:-
>
> I have 5 servers and client sessionTimeOut=120secs
>
>
>
> readTimeOut = 120 * 2 / 3 and is 80secs
>
>
>
> In this case, the it takes 80secs for the first timeout if the connected server is not responding. This is large time, if we consdier the serverlist, it can retry to next server immediately in <50secs.
>
>
>
>
>
> Thanks & Regards,
>
> Rakesh
>
>
>
>
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB