|
|
-
Re: Timeouts and ping handling
Camille Fournier 2012-01-18, 22:03
I think it can be done. Looking through the code, it seems like it should be safe modulo some stats that are set in the FinalRequestProcessor that may be less useful.
A question for the other zookeeper devs out there, is there a reason that we handle read-only operations in the first processor differently on the leader than the followers? The leader (calling PrepRequestProcessor first) will do a session check for any of the read-only requests: zks.sessionTracker.checkSession(request.sessionId, request.getOwner());
but the FollowerRequestProcessor will just push these requests to its second processor, and never check the session. What's the purpose of the session check on the leader but not the followers?
C
On Wed, Jan 18, 2012 at 4:26 PM, Manosiz Bhattacharyya <[EMAIL PROTECTED]>wrote:
> Hello, > > We are using Zookeeper-3.3.4 with client session timeouts of 5 seconds, > and we see frequent timeouts. We have a cluster of 50 nodes (3 of which are > ZK nodes) and each node has 5 client connections (a total of 250 connection > to the Ensemble). While investigating the zookeeper connections, we found > that sometimes pings sent from the zookeeper client does not return from > the server within 5 seconds, and the client connection gets disconnected. > Digging deeper it seems that pings are enqueued the same way as other > requests in the three stage request processing pipeline (prep, sync, > finalize) in zkserver. So if there are a lot of write operations from other > active sessions in front of a ping from an inactive session in the queues, > the inactive session could timeout. > > My question is whether we can return the ping request from the client > immediately from the server, as the purpose of the ping request seems to be > to treat it as an heartbeat from relatively inactive sessions. If we keep a > separate ping queue in the Prep phase which forwards it straight to the > finalize phase, possible requests before the ping which required I/O inside > the sync phase would not cause the client timeouts. I hope pings do not > generate any order in the database. I did take a cursory look at the code > and thought that could be done. Would really appreciate an opinion > regarding this. > > As an aside I should mention that increasing the session timeout to 20 > seconds did improved the problem significantly. However as we are using > Zookeeper to monitor health of our components, increasing the timeout means > that we only get to know a component's death 20 seconds later. This is > something we would definitely try to avoid, and would like to go to the 5 > second timeout. > > Regards, > Manosiz. >
+
Camille Fournier 2012-01-18, 22:03
-
Re: Timeouts and ping handling
Patrick Hunt 2012-01-18, 22:45
On Wed, Jan 18, 2012 at 2:03 PM, Camille Fournier <[EMAIL PROTECTED]> wrote: > I think it can be done. Looking through the code, it seems like it should > be safe modulo some stats that are set in the FinalRequestProcessor that > may be less useful. >
Turning around HBs at the head end of the server is a bad idea. If the server can't support the timeout you requested then you are setting yourself up for trouble if you try to fake it. (think through some of the failure cases...)
This is not something you want to do. Rather first look at some of the more obvious issues such as GC, then disk (I've seen ppl go to ramdisks in some cases), then OS/net tuning etc....
Patrick
+
Patrick Hunt 2012-01-18, 22:45
-
Re: Timeouts and ping handling
Camille Fournier 2012-01-18, 23:21
Duh, I knew there was something I was forgetting. You can't process the session timeout faster than the server can process the full pipeline, so making pings come back faster just means you will have a false sense of liveness for your services.
The question about why the leaders and followers handle read-only requests differently still stands, though.
C
On Wed, Jan 18, 2012 at 5:45 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> On Wed, Jan 18, 2012 at 2:03 PM, Camille Fournier <[EMAIL PROTECTED]> > wrote: > > I think it can be done. Looking through the code, it seems like it should > > be safe modulo some stats that are set in the FinalRequestProcessor that > > may be less useful. > > > > Turning around HBs at the head end of the server is a bad idea. If the > server can't support the timeout you requested then you are setting > yourself up for trouble if you try to fake it. (think through some of > the failure cases...) > > This is not something you want to do. Rather first look at some of the > more obvious issues such as GC, then disk (I've seen ppl go to > ramdisks in some cases), then OS/net tuning etc.... > > Patrick >
+
Camille Fournier 2012-01-18, 23:21
-
Re: Timeouts and ping handling
Patrick Hunt 2012-01-19, 01:38
On Wed, Jan 18, 2012 at 3:21 PM, Camille Fournier <[EMAIL PROTECTED]> wrote: > Duh, I knew there was something I was forgetting. You can't process the > session timeout faster than the server can process the full pipeline, so > making pings come back faster just means you will have a false sense of > liveness for your services.
There's also this - we only send HBs when the client is not active. HBs check that the server is alive but at the same time we're also letting the server know that we're alive.
However, when the client is active (sending read/write ops) we don't need a HB. Any read/write operation serves as the HB. Say we send a read operation to the server, we won't send another HB to the server until the read operation result comes back (and then 1/3 the timeout after that). In this case you can't take advantage of the hack that's been discussed. The read operation needs to complete, if it takes too long (as in this case) the session will timeout as usual. Now, if you have clients that are largely inactive this may not matter too much, but depending on the use case you might get caught by this.
Patrick
+
Patrick Hunt 2012-01-19, 01:38
|
|