Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> Server seems not to be sending keep-alives so I lose my session ("have not heard from server....")


+
Brian Tarbox 2013-02-25, 16:51
+
Camille Fournier 2013-02-25, 16:56
Copy link to this message
-
Re: Server seems not to be sending keep-alives so I lose my session ("have not heard from server....")
The server logs don't say anything.  I do have a theory based on reading
the code, specifically the SendThread class within ClientCnxn.java

It took me a while to figure that its the client that sends the ping due to
the error message being "have not heard from the *server *..."
Once I got past that the key line in the code is:

int timeToNextPing = readTimeout / 2  - clientCnxnSocket.getIdleSend()

This basically means that the client will get at most 2 tries to send the
ping within the timeout interval, no matter what you set the timeout value
to.
In a lossy network this may be insufficient...as can be seen from my client
logs where I can go 30 seconds without sending a ping.

I'm running a test now where I've changed the "2" to a "4".  I trade a tiny
increase in network traffic for a much higher chance of getting a
successful ping even in a bad network environment.

Brian
On Mon, Feb 25, 2013 at 11:56 AM, Camille Fournier <[EMAIL PROTECTED]>wrote:

> What do your server logs say during this time?
>
>
> On Mon, Feb 25, 2013 at 11:51 AM, Brian Tarbox <[EMAIL PROTECTED]
> >wrote:
>
> > I am getting the dreaded message:
> >
> >  10:59:45,871 INFO [org.apache.zookeeper.ClientCnxn] - <Client session
> > timed out, have not heard from server in 31482ms for sessionid
> > 0x13d11dd08160007, closing socket connection and attempting reconnect>
> >
> > and from looking at the logs it certainly seems that the keep alive
> > messages are sometime just not being sent.
> >
> > In my case I see a bunch of these:
> > 10:58:00,164 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:58:13,511 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:58:26,857 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:58:40,205 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> > 10:59:14,140 DEBUG [org.apache.zookeeper.ClientCnxn] - <Got ping response
> > for sessionid: 0x13d11dd08160007 after 0ms>
> >
> > But then nothing from 10:59:14 until 10:59:45 when my client decides its
> > been too long and so times out.
> >
> > I'm running 3.4.5 on EC2 ...any suggestions welcome.
> >
> > Thanks.
> >
> > Brian Tarbox
> > --
> > http://about.me/BrianTarbox
> >
>

--
http://about.me/BrianTarbox
+
Camille Fournier 2013-02-26, 18:10
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB