Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> lost ZK events across datacenters


Copy link to this message
-
Re: lost ZK events across datacenters
Ben,

The log is binary. Is there a log reader? Also, can I just look at the log
on any zookeeper server?

Thanks,

Jun

On Fri, Jun 3, 2011 at 10:18 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote:

> actually, i think the transaction log could help a lot, and that will
> always be there. two scenarios i can think of are:
> 1) the change happened before the watch was set
> 2) the change never got there
> you could get an answer to both of those questions by looking at the
> transaction log.
>
> ben
>
> On Fri, Jun 3, 2011 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> > I don't expect that we can discover the problem right now. However, what
> are
> > the things that I can do to collect enough tracing should the problem
> occur
> > again in the future (e.g., is INFO level logging enough)?
> >
> > Thanks,
> >
> > Jun
> >
> > On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >
> >> The log doesn't have any state changing entries around the time the
> watcher
> >> is triggered, in all clients.
> >>
> >> Jun
> >>
> >>
> >> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] <
> >> [EMAIL PROTECTED]> wrote:
> >>
> >>> Any state changes for the problem client between setting the watch and
> >>> when you expected it to get called? Do you have logs for that client vs
> the
> >>> others that show anything?
> >>>
> >>> -----Original Message-----
> >>> From: Jun Rao [mailto:[EMAIL PROTECTED]]
> >>> Sent: Friday, June 03, 2011 4:40 AM
> >>> To: [EMAIL PROTECTED]
> >>> Subject: Re: lost ZK events across datacenters
> >>>
> >>> Ben,
> >>>
> >>> Some details below.
> >>>
> >>> The call that sets the watcher simple calls getChildren with watcher
> flag
> >>> set to true. The triggering change is that one of the child nodes
> (which
> >>> is
> >>> ephemeral) is deleted because the creating client is gone.
> >>>
> >>> Thanks,
> >>>
> >>> Jun
> >>>
> >>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]>
> wrote:
> >>>
> >>> > can you tell us a bit more about the scenario? what was the call the
> >>> > set the watch event? and what were the changes that caused the event?
> >>> >
> >>> > thanx
> >>> > ben
> >>> >
> >>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> >>> > > All my clients were on different machines. 2 of them got the
> watcher
> >>> > fired
> >>> > > about the same time. The third one never got the watcher triggered.
> >>> > >
> >>> > > Thanks,
> >>> > >
> >>> > > Jun
> >>> > >
> >>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] <
> >>> > > [EMAIL PROTECTED]> wrote:
> >>> > >
> >>> > >> All clients are in different processes?
> >>> > >> I've used zkclient and haven't seen any problems, but I haven't
> >>> hammered
> >>> > it
> >>> > >> too hard yet. I took a long look at the code and didn't see any
> >>> errors
> >>> > but
> >>> > >> there could always be something very subtle.
> >>> > >>
> >>> > >> -----Original Message-----
> >>> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]]
> >>> > >> Sent: Wednesday, June 01, 2011 4:09 PM
> >>> > >> To: [EMAIL PROTECTED]
> >>> > >> Subject: Re: lost ZK events across datacenters
> >>> > >>
> >>> > >> I am using the zkclient package (
> >>> > >> https://github.com/sgroschupf/zkclient.git).
> >>> > >> The watcher code seems reasonable. Basically, each watcher event
> is
> >>> > first
> >>> > >> added to a queue. A separate event thread dequeues each event and
> >>> reads
> >>> > the
> >>> > >> children of a path (which re-registers the watcher) and invokes
> the
> >>> > >> registered listener.
> >>> > >>
> >>> > >> Anybody knows any issues in zkclient?
> >>> > >>
> >>> > >> Thanks,
> >>> > >>
> >>> > >> Jun
> >>> > >>
> >>> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <
> [EMAIL PROTECTED]>
> >>> > >> wrote:
> >>> > >>
> >>> > >> > This is most commonly due, in my own history of programming
> errors,
> >>> to
> >>> > >> > writing code that has a race window in it.  It is conceivable
> that