Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> lost ZK events across datacenters


Copy link to this message
-
Re: lost ZK events across datacenters
We are on java 6.

Jun

On Mon, Jun 6, 2011 at 12:13 PM, Fournier, Camille F. [Tech] <
[EMAIL PROTECTED]> wrote:

> Hey Jun, question: What version of Java are your clients running? I keep
> hitting a bug in my java5 test suite and I'm wondering if in fact I am
> seeing the same problem you're reporting here.
>
> C
>
> -----Original Message-----
> From: Jun Rao [mailto:[EMAIL PROTECTED]]
> Sent: Friday, June 03, 2011 12:59 PM
> To: [EMAIL PROTECTED]
> Subject: Re: lost ZK events across datacenters
>
> I don't expect that we can discover the problem right now. However, what
> are
> the things that I can do to collect enough tracing should the problem occur
> again in the future (e.g., is INFO level logging enough)?
>
> Thanks,
>
> Jun
>
> On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
>
> > The log doesn't have any state changing entries around the time the
> watcher
> > is triggered, in all clients.
> >
> > Jun
> >
> >
> > On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] <
> > [EMAIL PROTECTED]> wrote:
> >
> >> Any state changes for the problem client between setting the watch and
> >> when you expected it to get called? Do you have logs for that client vs
> the
> >> others that show anything?
> >>
> >> -----Original Message-----
> >> From: Jun Rao [mailto:[EMAIL PROTECTED]]
> >> Sent: Friday, June 03, 2011 4:40 AM
> >> To: [EMAIL PROTECTED]
> >> Subject: Re: lost ZK events across datacenters
> >>
> >> Ben,
> >>
> >> Some details below.
> >>
> >> The call that sets the watcher simple calls getChildren with watcher
> flag
> >> set to true. The triggering change is that one of the child nodes (which
> >> is
> >> ephemeral) is deleted because the creating client is gone.
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]>
> wrote:
> >>
> >> > can you tell us a bit more about the scenario? what was the call the
> >> > set the watch event? and what were the changes that caused the event?
> >> >
> >> > thanx
> >> > ben
> >> >
> >> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >> > > All my clients were on different machines. 2 of them got the watcher
> >> > fired
> >> > > about the same time. The third one never got the watcher triggered.
> >> > >
> >> > > Thanks,
> >> > >
> >> > > Jun
> >> > >
> >> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] <
> >> > > [EMAIL PROTECTED]> wrote:
> >> > >
> >> > >> All clients are in different processes?
> >> > >> I've used zkclient and haven't seen any problems, but I haven't
> >> hammered
> >> > it
> >> > >> too hard yet. I took a long look at the code and didn't see any
> >> errors
> >> > but
> >> > >> there could always be something very subtle.
> >> > >>
> >> > >> -----Original Message-----
> >> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]]
> >> > >> Sent: Wednesday, June 01, 2011 4:09 PM
> >> > >> To: [EMAIL PROTECTED]
> >> > >> Subject: Re: lost ZK events across datacenters
> >> > >>
> >> > >> I am using the zkclient package (
> >> > >> https://github.com/sgroschupf/zkclient.git).
> >> > >> The watcher code seems reasonable. Basically, each watcher event is
> >> > first
> >> > >> added to a queue. A separate event thread dequeues each event and
> >> reads
> >> > the
> >> > >> children of a path (which re-registers the watcher) and invokes the
> >> > >> registered listener.
> >> > >>
> >> > >> Anybody knows any issues in zkclient?
> >> > >>
> >> > >> Thanks,
> >> > >>
> >> > >> Jun
> >> > >>
> >> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <
> [EMAIL PROTECTED]>
> >> > >> wrote:
> >> > >>
> >> > >> > This is most commonly due, in my own history of programming
> errors,
> >> to
> >> > >> > writing code that has a race window in it.  It is conceivable
> that
> >> > cross
> >> > >> > data-center operation would make such a race more of a problem.
> >> > >> >
> >> > >> > Can you say a bit about your code?  Did you make sure to use