Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Zookeeper session losing some watchers


Copy link to this message
-
Re: Zookeeper session losing some watchers
Jun Rao 2011-11-08, 00:27
Jamie,

We do use chroot. However, the chroot problem will lose all watchers, not
some watchers, right?

Thanks,

Jun

On Wed, Nov 2, 2011 at 7:34 PM, Jamie Rothfeder
<[EMAIL PROTECTED]>wrote:

> Hi Neha,
>
> I encountered a similar problem with zookeeper losing watches and found
> that it was related to this bug:
>
> https://issues.apache.org/jira/browse/ZOOKEEPER-961
>
> Are you using a chroot?
>
> Thanks,
> Jamie
>
> On Wed, Nov 2, 2011 at 1:16 PM, Neha Narkhede <[EMAIL PROTECTED]
> >wrote:
>
> > Hi,
> >
> > We've been seeing a problem with our zookeeper servers lately, where
> > all of a sudden a session loses some of the watchers registered on
> > some of the znodes. Let me explain our Kafka-ZK setup. We have a Kafka
> > cluster in one DC establishing sessions (with 6sec timeout) with a ZK
> > cluster (of 4 machines) in another DC and registers watchers on some
> > zookeeper paths. Every couple of weeks, we observe some problem with
> > the Kafka servers, where on investigating further, we find that the
> > session lost some of the key watches, but not all.
> >
> > The last time this happened, we ran the wchc command on the ZK servers
> > and saw the problem. Unfortunately, we lost relevant information from
> > the ZK logs by the time we were ready to debug it further. Since this
> > causes Kafka servers to stop making progress, we want to setup some
> > kind of alert when this happens. This will help us collect more
> > information to give you. Particularly, we were thinking about running
> > wchp periodically (maybe once a minute), grepping for the ZK paths and
> > counting the number of watches that should be registered for correct
> > operation. But I observed that the watcher info is not replicated
> > across all ZK servers, so we would have to query every ZK server to
> > inorder to get the full list.
> >
> > I'm not sure running wchp periodically on all ZK servers is the best
> > option for this alert. Can you think of what could be the problem here
> > and how we can setup this alert for now ?
> >
> > Thanks
> > Neha
> >
>