Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Server seems not to be sending keep-alives so I lose my session ("have not heard from server....")


Copy link to this message
-
Re: Server seems not to be sending keep-alives so I lose my session ("have not heard from server....")
Brian Tarbox 2013-02-25, 18:05
The server logs don't say anything.  I do have a theory based on reading
the code, specifically the SendThread class within ClientCnxn.java

It took me a while to figure that its the client that sends the ping due to
the error message being "have not heard from the *server *..."
Once I got past that the key line in the code is:

int timeToNextPing = readTimeout / 2  - clientCnxnSocket.getIdleSend()

This basically means that the client will get at most 2 tries to send the
ping within the timeout interval, no matter what you set the timeout value
to.
In a lossy network this may be insufficient...as can be seen from my client
logs where I can go 30 seconds without sending a ping.

I'm running a test now where I've changed the "2" to a "4".  I trade a tiny
increase in network traffic for a much higher chance of getting a
successful ping even in a bad network environment.

Brian
On Mon, Feb 25, 2013 at 11:56 AM, Camille Fournier <[EMAIL PROTECTED]>wrote:

> What do your server logs say during this time?
>
>
> On Mon, Feb 25, 2013 at 11:51 AM, Brian Tarbox <[EMAIL PROTECTED]
> >wrote:
>
> > I am getting the dreaded message:
> >
> >  10:59:45,871 INFO [org.apache.zookeeper.ClientCnxn] - <Client session
> > timed out, have not heard from server in 31482ms for sessionid
> > 0x13d11dd08160007, closing socket connection and attempting reconnect>
> >
> > and from looking at the logs it certainly seems that the keep alive
> > messages are sometime just not being sent.
> >
> > In my case I see a bunch of these:
> > 10:58:00,164 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:58:13,511 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:58:26,857 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:58:40,205 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:59:14,140 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> >
> > But then nothing from 10:59:14 until 10:59:45 when my client decides its
> > been too long and so times out.
> >
> > I'm running 3.4.5 on EC2 ...any suggestions welcome.
> >
> > Thanks.
> >
> > Brian Tarbox
> > --
> > http://about.me/BrianTarbox
> >
>

--
http://about.me/BrianTarbox