Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Timeouts and ping handling

Copy link to this message
Re: Timeouts and ping handling
If you aren't pushing much data through ZK, there is almost no way that the
request queue can fill up without the log or snapshot disks being slow.
 See what happens if you put the log into a real disk or (heaven help us)
onto a tmpfs partition.

On Thu, Jan 19, 2012 at 2:18 AM, Manosiz Bhattacharyya

> I will do as you mention.
> We are using the async API's throughout. Also we do not write too much data
> into Zookeeper. We just use it for leadership elections and health
> monitoring, which is why we see the timeouts typically on idle zookeeper
> connections.
> The reason why we want the sessions to be alive is because of the
> leadership election algorithm that we use from the zookeeper recipe. So if
> a connection is broken for the leader node, the ephemeral node that
> guaranteed its leadership is lost, and reconnecting will create a new node
> which does not guarantee leadership. We then have to re-elect a new leader
> - which requires significant work. The bigger the timeout, bigger is the
> time the cluster stays without a master for a particular service, as the
> old master cannot keep on working once it has known its session is gone and
> with it, its ephemeral node. As we are trying to have highly available
> service (not internet scale, but at the scale of a storage system with ms
> latencies typically), we thought about reducing the timeout, but keeping
> the session open. Also note the node that typically is the master does not
> write too often into zookeeper.
> Thanks,
> Manosiz.
> On Wed, Jan 18, 2012 at 5:49 PM, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> > On Wed, Jan 18, 2012 at 4:47 PM, Manosiz Bhattacharyya
> > <[EMAIL PROTECTED]> wrote:
> > > Thanks Patrick for your answer,
> >
> > No problem.
> >
> > > Actually we are in a virtualized environment, we have a FIO disk for
> > > transactional logs. It does have some latency sometimes during FIO
> > garbage
> > > collection. We know this could be the potential issue, but was trying
> to
> > > workaround that.
> >
> > Ah, I see. I saw something very similar to this recently with SSDs
> > used for the datadir. The fdatasync latency was sometimes > 10
> > seconds. I suspect it happened as a result of disk GC activity.
> >
> > I was able to identify the problem by running something like this:
> >
> > sudo strace -r -T -f -p 8066 -e trace=fsync,fdatasync -o trace.txt
> >
> > and then graphing the results (log scale). You should try running this
> > against your servers to confirm that it is indeed the problem.
> >
> > > We were trying to qualify the requests into two types - either HB's or
> > > normal requests. Isn't it better to reject normal requests if the queue
> > > size is full to say a certain threshold, but keep the session alive.
> That
> > > way the flow control can be achieved with the users session retrying
> the
> > > operation, but the session health would be maintained.
> >
> > What good is a session (connection) that's not usable? You're better
> > off disconnecting and re-establishing with a server that can process
> > your requests in a timely fashion.
> >
> > ZK looks at availability from a service perspective, not from an
> > individual session/connection perspective. The whole more important
> > than the parts. There already is very sophisticated flow control going
> > on - e.g. the sessions shut down and stop reading requests when the
> > number of outstanding requests on a server exceeds some threshold.
> > Once the server catches up it starts reading again. Again - checkout
> > your "stat" results for insight into this. (ie "outstanding requests")
> >
> > Patrick
> >