Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> zookeeper_interest returning ZOK on Connection Loss

Yunong Xiao 2012-11-04, 22:11
Michi Mutsuzaki 2012-11-05, 18:33
Copy link to this message
Re: zookeeper_interest returning ZOK on Connection Loss

I don't think just returning error after checking SO_ERROR is sufficient here,
since this indicates connection loss but not session expiration. The client
could still connect to a different zk server on future iterations of
zookeeper_interest(), which will not occur if we return error. I think what is
needed here is to keep track of the amount of time that has elapsed since the
last time a connection was established to the server, while trying to connect
to servers in the list. If this time exceeds session_timeout, then return an
error such as ZOO_EXPIRED_SESSION_STATE. I've patched my local zookeeper client
to keep track of this state.

In addition, It's not clear to me how you would go about checking for SO_ERROR
within zookeeper.c, since you'll need to check SO_ERROR after calling select(),
which -- at least in my case, since I'm using the single threaded client, and I
am embedding in the node.js/libuv runtime -- needs to be outside of the C
client. I'd like to hear your thoughts on how this could be better achieved. I
would propose the following patch:

1) zookeeper_interest needs to return state about it's current connection
status, since the zhandle is opaque. This will allow consuming clients to check
for the status of the non-blocking connect.  

2) zookeeper_interest has to keep track of the last time it was connected to a
server, and return ZOO_EXPIRED_SESSION_STATE if it's still unable to connect
after the timeout exceeds the session timeout.  

3) some sample code/documentation around checking for the status of
non-blocking connects in user code for consumers of zookeeper.h.

Does this seem reasonable? Would you guys be open to taking my patch?


On Nov 5, 2012, at 10:33 AM, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:

> Hi Yunong,
> Yes, this looks like a bug. The problem is that the C client is not
> handling the case when connect() returns EINPROGRESS or EWOULDBLOCK
> and eventually fails. I think the right fix is to check SO_ERROR after
> the socket becomes writable. Please go ahead and open a jira.
> Thanks!
> --Michi
> On Sun, Nov 4, 2012 at 2:11 PM, Yunong Xiao <[EMAIL PROTECTED]> wrote:
>> I have a fairly simple single-threaded C client set up -- single-threaded
>> because we are embedding zk in the node.js/libuv runtime -- which consists of
>> the following algorithm:
>> zookeeper_interest(); select();
>> // perform zookeeper api calls
>> zookeeper_process();
>> I've noticed that zookeeper_interest in the C client never returns error if it
>> is unable to connect to the zk server.
>> From the spec of the zookeeper_interest API, I see that zookeeper_interest is
>> supposed to return ZCONNECTIONLOSS when disconnected from the client. However,
>> digging into the code, I see that the client is making a non-blocking connect
>> call
>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613
>> ,  and returning ZOK
>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684
>> If we assume that the server is not up, this will mean that the subsequent
>> select() call would return 0, since the fd is not ready, and future calls to
>> zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS.
>> Thus an upstream client will never be aware that the connection is lost.
>> I don't think this is the expected behavior. I have temporarily patched the zk
>> C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still
>> unable to connect after session_timeout has been exceeded.
>> Is this the right interpretation of the API? Are you guys open to taking the
>> patch I described?
>> -Yunong
Michi Mutsuzaki 2012-11-05, 21:59
Yunong Xiao 2012-11-05, 22:16
Michi Mutsuzaki 2012-11-06, 07:24
Yunong Xiao 2012-11-14, 01:08
Anthony Barré 2013-03-27, 09:41