Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # dev >> zookeeper_interest returning ZOK on Connection Loss


+
Yunong Xiao 2012-11-04, 22:11
+
Michi Mutsuzaki 2012-11-05, 18:33
Copy link to this message
-
Re: zookeeper_interest returning ZOK on Connection Loss
Michi,

I don't think just returning error after checking SO_ERROR is sufficient here,
since this indicates connection loss but not session expiration. The client
could still connect to a different zk server on future iterations of
zookeeper_interest(), which will not occur if we return error. I think what is
needed here is to keep track of the amount of time that has elapsed since the
last time a connection was established to the server, while trying to connect
to servers in the list. If this time exceeds session_timeout, then return an
error such as ZOO_EXPIRED_SESSION_STATE. I've patched my local zookeeper client
to keep track of this state.

In addition, It's not clear to me how you would go about checking for SO_ERROR
within zookeeper.c, since you'll need to check SO_ERROR after calling select(),
which -- at least in my case, since I'm using the single threaded client, and I
am embedding in the node.js/libuv runtime -- needs to be outside of the C
client. I'd like to hear your thoughts on how this could be better achieved. I
would propose the following patch:

1) zookeeper_interest needs to return state about it's current connection
status, since the zhandle is opaque. This will allow consuming clients to check
for the status of the non-blocking connect.  

2) zookeeper_interest has to keep track of the last time it was connected to a
server, and return ZOO_EXPIRED_SESSION_STATE if it's still unable to connect
after the timeout exceeds the session timeout.  

3) some sample code/documentation around checking for the status of
non-blocking connects in user code for consumers of zookeeper.h.

Does this seem reasonable? Would you guys be open to taking my patch?

-Yunong

On Nov 5, 2012, at 10:33 AM, Michi Mutsuzaki <[EMAIL PROTECTED]> wrote:

> Hi Yunong,
>
> Yes, this looks like a bug. The problem is that the C client is not
> handling the case when connect() returns EINPROGRESS or EWOULDBLOCK
> and eventually fails. I think the right fix is to check SO_ERROR after
> the socket becomes writable. Please go ahead and open a jira.
>
> Thanks!
> --Michi
>
> On Sun, Nov 4, 2012 at 2:11 PM, Yunong Xiao <[EMAIL PROTECTED]> wrote:
>> I have a fairly simple single-threaded C client set up -- single-threaded
>> because we are embedding zk in the node.js/libuv runtime -- which consists of
>> the following algorithm:
>>
>> zookeeper_interest(); select();
>> // perform zookeeper api calls
>> zookeeper_process();
>>
>> I've noticed that zookeeper_interest in the C client never returns error if it
>> is unable to connect to the zk server.
>>
>> From the spec of the zookeeper_interest API, I see that zookeeper_interest is
>> supposed to return ZCONNECTIONLOSS when disconnected from the client. However,
>> digging into the code, I see that the client is making a non-blocking connect
>> call
>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1596-1613
>> ,  and returning ZOK
>> https://github.com/apache/zookeeper/blob/trunk/src/c/src/zookeeper.c#L1684
>>
>> If we assume that the server is not up, this will mean that the subsequent
>> select() call would return 0, since the fd is not ready, and future calls to
>> zookeeper_interest will always return 0 and not the expected ZCONNECTIONLOSS.
>> Thus an upstream client will never be aware that the connection is lost.
>>
>> I don't think this is the expected behavior. I have temporarily patched the zk
>> C client such that zookeeper_interest will return ZCONNECTIONLOSS if it's still
>> unable to connect after session_timeout has been exceeded.
>>
>> Is this the right interpretation of the API? Are you guys open to taking the
>> patch I described?
>>
>> -Yunong
+
Michi Mutsuzaki 2012-11-05, 21:59
+
Yunong Xiao 2012-11-05, 22:16
+
Michi Mutsuzaki 2012-11-06, 07:24
+
Yunong Xiao 2012-11-14, 01:08
+
Anthony Barré 2013-03-27, 09:41
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB