Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # dev - Re: [jira] [Commented] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host


Copy link to this message
-
Re: [jira] [Commented] (ZOOKEEPER-1057) zookeeper c-client, connection to offline server fails to successfully fallback to second zk host
Marshall McMullen 2013-12-20, 23:58
The logic of how we connect to servers in trunk 3.5.0 is substantially
different than what was in 3.4.6. Has this bug been seen in 3.4.6 or trunk?
On Fri, Dec 20, 2013 at 4:14 PM, Flavio Junqueira (JIRA) <[EMAIL PROTECTED]>wrote:

>
>     [
> https://issues.apache.org/jira/browse/ZOOKEEPER-1057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13854652#comment-13854652]
>
> Flavio Junqueira commented on ZOOKEEPER-1057:
> ---------------------------------------------
>
> If this is a change due to reconfig, do we really need to block 3.4.6?
>
> > zookeeper c-client, connection to offline server fails to successfully
> fallback to second zk host
> >
> -------------------------------------------------------------------------------------------------
> >
> >                 Key: ZOOKEEPER-1057
> >                 URL:
> https://issues.apache.org/jira/browse/ZOOKEEPER-1057
> >             Project: ZooKeeper
> >          Issue Type: Bug
> >          Components: c client
> >    Affects Versions: 3.3.1, 3.3.2, 3.3.3
> >         Environment: snowdutyrise-lm ~/-> uname -a
> > Darwin snowdutyrise-lm 9.8.0 Darwin Kernel Version 9.8.0: Wed Jul 15
> 16:55:01 PDT 2009; root:xnu-1228.15.4~1/RELEASE_I386 i386
> > also observed on:
> > 2.6.35-28-server 49-Ubuntu SMP Tue Mar 1 14:55:37 UTC 2011
> >            Reporter: Woody Anderson
> >            Assignee: Michi Mutsuzaki
> >            Priority: Blocker
> >             Fix For: 3.4.6, 3.5.0
> >
> >         Attachments: ZOOKEEPER-1057.patch, ZOOKEEPER-1057.patch
> >
> >
> > Hello, I'm a contributor for the node.js zookeeper module:
> https://github.com/yfinkelstein/node-zookeeper
> > i'm using zk 3.3.3 for the purposes of this issue, but i have validated
> it fails on 3.3.1 and 3.3.2
> > i'm having an issue when trying to connect when one of my zookeeper
> servers is offline.
> > if the first server attempted is online, all is good.
> > if the offline server is attempted first, then the client is never able
> to connect to _any_ server.
> > inside zookeeper.c a connection loss (-4) is received, the socket is
> closed and buffers are cleaned up, it then attempts the next server in the
> list, creates a new socket (which gets the same fd as the previously closed
> socket) and connecting fails, and it continues to fail seemingly forever.
> > The nature of this "fail" is not that it gets -4 connection loss errors,
> but that zookeeper_interest doesn't find anything going on on the socket
> before the user provided timeout kicks things out. I don't want to have to
> wait 5 minutes, even if i could make myself.
> > this is the message that follows the connection loss:
> > 2011-04-27 23:18:28,355:13485:ZOO_ERROR@handle_socket_error_msg@1530:
> Socket [127.0.0.1:5020] zk retcode=-7, errno=60(Operation timed out):
> connection timed out (exceeded timeout by 3ms)
> > 2011-04-27 23:18:28,355:13485:ZOO_ERROR@yield@213:
> yield:zookeeper_interest returned error: -7 - operation timeout
> > While investigating, i decided to comment out close(zh->fd) in
> handle_error (zookeeper.c#1153)
> > now everything works (obviously i'm leaking an fd). Connection the the
> second host works immediately.
> > this is the behavior i'm looking for, though i clearly don't want to
> leak the fd, so i'm wondering why the fd re-use is causing this issue.
> > close() is not returning an error (i checked even though current code
> assumes success).
> > i'm on osx 10.6.7
> > i tried adding a setsockopt so_linger (though i didn't want that to be a
> solution), it didn't work.
> > full debug traces are included in issue here:
> https://github.com/yfinkelstein/node-zookeeper/issues/6
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v6.1.4#6159)
>