|
Jun Rao
2011-06-01, 18:40
Ted Dunning
2011-06-01, 19:04
Jun Rao
2011-06-01, 20:08
Ted Dunning
2011-06-01, 20:18
Jun Rao
2011-06-01, 21:05
Fournier, Camille F. [Tec...
2011-06-01, 21:18
Jun Rao
2011-06-01, 22:14
Ted Dunning
2011-06-01, 23:21
Camille Fournier
2011-06-01, 23:28
Ted Dunning
2011-06-01, 23:29
Benjamin Reed
2011-06-02, 17:49
Jun Rao
2011-06-03, 08:40
Fournier, Camille F. [Tec...
2011-06-03, 16:32
Jun Rao
2011-06-03, 16:56
Jun Rao
2011-06-03, 16:59
Benjamin Reed
2011-06-03, 17:18
Fournier, Camille F. [Tec...
2011-06-06, 19:13
Jun Rao
2011-06-08, 20:51
Jun Rao
2011-06-08, 20:55
Benjamin Reed
2011-06-08, 21:02
Jun Rao
2011-06-10, 06:02
Ted Dunning
2011-06-10, 06:09
|
-
lost ZK events across datacentersJun Rao 2011-06-01, 18:40
Hi,
I have a setup where multiple ZK clients are sitting in a different datacenter from the ZK server. All clients registered the same child watcher on a path. However, when the children of the path changed, the watcher on 1 of the clients didn't fire. This seems to have happened a couple of times to me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter setup and seen problems like that before? Thanks, Jun
-
Re: lost ZK events across datacentersTed Dunning 2011-06-01, 19:04
This is most commonly due, in my own history of programming errors, to
writing code that has a race window in it. It is conceivable that cross data-center operation would make such a race more of a problem. Can you say a bit about your code? Did you make sure to use standard idioms as opposed to setting the watch in a different call from reading the data? On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Hi, > > I have a setup where multiple ZK clients are sitting in a different > datacenter from the ZK server. All clients registered the same child > watcher > on a path. However, when the children of the path changed, the watcher on 1 > of the clients didn't fire. This seems to have happened a couple of times > to > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter setup and > seen problems like that before? > > Thanks, > > Jun >
-
Re: lost ZK events across datacentersJun Rao 2011-06-01, 20:08
I am using the zkclient package (https://github.com/sgroschupf/zkclient.git).
The watcher code seems reasonable. Basically, each watcher event is first added to a queue. A separate event thread dequeues each event and reads the children of a path (which re-registers the watcher) and invokes the registered listener. Anybody knows any issues in zkclient? Thanks, Jun On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > This is most commonly due, in my own history of programming errors, to > writing code that has a race window in it. It is conceivable that cross > data-center operation would make such a race more of a problem. > > Can you say a bit about your code? Did you make sure to use standard > idioms > as opposed to setting the watch in a different call from reading the data? > > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I have a setup where multiple ZK clients are sitting in a different > > datacenter from the ZK server. All clients registered the same child > > watcher > > on a path. However, when the children of the path changed, the watcher on > 1 > > of the clients didn't fire. This seems to have happened a couple of times > > to > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter setup > and > > seen problems like that before? > > > > Thanks, > > > > Jun > > >
-
Re: lost ZK events across datacentersTed Dunning 2011-06-01, 20:18
I am generally a bit skeptical of attempts to "simplify" the ZK API while
claiming to retain or improve reliability and availability. I haven't looked at zkclient for a long time, but I was dubious at one time. For a counter example, kept collections simplifies the API by presenting virtual collections, but they explicitly warn that this loses some information. On Wed, Jun 1, 2011 at 1:08 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > I am using the zkclient package ( > https://github.com/sgroschupf/zkclient.git). > The watcher code seems reasonable. Basically, each watcher event is first > added to a queue. A separate event thread dequeues each event and reads the > children of a path (which re-registers the watcher) and invokes the > registered listener. > > Anybody knows any issues in zkclient? >
-
Re: lost ZK events across datacentersJun Rao 2011-06-01, 21:05
The most important feature that I rely zkclient on is to hide
zkConnectionLoss exception (just block and retry when connection is in sync mode again). I assume that quite a few applications want something like that. Does it make sense for ZK to provide such functionality directly, instead of everyone implementing their own stuff? Thanks, Jun On Wed, Jun 1, 2011 at 1:18 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > I am generally a bit skeptical of attempts to "simplify" the ZK API while > claiming to retain or improve reliability and availability. I haven't > looked at zkclient for a long time, but I was dubious at one time. > > For a counter example, kept collections simplifies the API by presenting > virtual collections, but they explicitly warn that this loses some > information. > > On Wed, Jun 1, 2011 at 1:08 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > I am using the zkclient package ( > > https://github.com/sgroschupf/zkclient.git). > > The watcher code seems reasonable. Basically, each watcher event is first > > added to a queue. A separate event thread dequeues each event and reads > the > > children of a path (which re-registers the watcher) and invokes the > > registered listener. > > > > Anybody knows any issues in zkclient? > > >
-
RE: lost ZK events across datacentersFournier, Camille F. [Tec... 2011-06-01, 21:18
All clients are in different processes?
I've used zkclient and haven't seen any problems, but I haven't hammered it too hard yet. I took a long look at the code and didn't see any errors but there could always be something very subtle. -----Original Message----- From: Jun Rao [mailto:[EMAIL PROTECTED]] Sent: Wednesday, June 01, 2011 4:09 PM To: [EMAIL PROTECTED] Subject: Re: lost ZK events across datacenters I am using the zkclient package (https://github.com/sgroschupf/zkclient.git). The watcher code seems reasonable. Basically, each watcher event is first added to a queue. A separate event thread dequeues each event and reads the children of a path (which re-registers the watcher) and invokes the registered listener. Anybody knows any issues in zkclient? Thanks, Jun On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> wrote: > This is most commonly due, in my own history of programming errors, to > writing code that has a race window in it. It is conceivable that cross > data-center operation would make such a race more of a problem. > > Can you say a bit about your code? Did you make sure to use standard > idioms > as opposed to setting the watch in a different call from reading the data? > > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > Hi, > > > > I have a setup where multiple ZK clients are sitting in a different > > datacenter from the ZK server. All clients registered the same child > > watcher > > on a path. However, when the children of the path changed, the watcher on > 1 > > of the clients didn't fire. This seems to have happened a couple of times > > to > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter setup > and > > seen problems like that before? > > > > Thanks, > > > > Jun > > >
-
Re: lost ZK events across datacentersJun Rao 2011-06-01, 22:14
All my clients were on different machines. 2 of them got the watcher fired
about the same time. The third one never got the watcher triggered. Thanks, Jun On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < [EMAIL PROTECTED]> wrote: > All clients are in different processes? > I've used zkclient and haven't seen any problems, but I haven't hammered it > too hard yet. I took a long look at the code and didn't see any errors but > there could always be something very subtle. > > -----Original Message----- > From: Jun Rao [mailto:[EMAIL PROTECTED]] > Sent: Wednesday, June 01, 2011 4:09 PM > To: [EMAIL PROTECTED] > Subject: Re: lost ZK events across datacenters > > I am using the zkclient package ( > https://github.com/sgroschupf/zkclient.git). > The watcher code seems reasonable. Basically, each watcher event is first > added to a queue. A separate event thread dequeues each event and reads the > children of a path (which re-registers the watcher) and invokes the > registered listener. > > Anybody knows any issues in zkclient? > > Thanks, > > Jun > > On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> > wrote: > > > This is most commonly due, in my own history of programming errors, to > > writing code that has a race window in it. It is conceivable that cross > > data-center operation would make such a race more of a problem. > > > > Can you say a bit about your code? Did you make sure to use standard > > idioms > > as opposed to setting the watch in a different call from reading the > data? > > > > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > > > Hi, > > > > > > I have a setup where multiple ZK clients are sitting in a different > > > datacenter from the ZK server. All clients registered the same child > > > watcher > > > on a path. However, when the children of the path changed, the watcher > on > > 1 > > > of the clients didn't fire. This seems to have happened a couple of > times > > > to > > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter setup > > and > > > seen problems like that before? > > > > > > Thanks, > > > > > > Jun > > > > > >
-
Re: lost ZK events across datacentersTed Dunning 2011-06-01, 23:21
That is exactly the part of zkClient that I think is most subject to error
and is what I meant by inappropriate hiding of details. You can't just assume that you can retry an operation on Zookeeper and get the right result. The correct handling is considerably more subtle. Hiding that is not a good thing unless you say right up front that you are compromising either expressivity (as does Kept Collections) or correctness (as does zkClient). On Wed, Jun 1, 2011 at 2:05 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > The most important feature that I rely zkclient on is to hide > zkConnectionLoss exception (just block and retry when connection is in sync > mode again). I assume that quite a few applications want something like > that. Does it make sense for ZK to provide such functionality directly, > instead of everyone implementing their own stuff? >
-
Re: lost ZK events across datacentersCamille Fournier 2011-06-01, 23:28
I'm sure there are cases where an incorrect retry is bad, but for at least
some use cases it is a trivial aspect of correctness and a worthwhile tradeoff in terms of app simplicity. C On Jun 1, 2011 7:22 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > That is exactly the part of zkClient that I think is most subject to error > and is what I meant by inappropriate hiding of details. > > You can't just assume that you can retry an operation on Zookeeper and get > the right result. The correct handling is considerably more subtle. Hiding > that is not a good thing unless you say right up front that you are > compromising either expressivity (as does Kept Collections) or correctness > (as does zkClient). > > On Wed, Jun 1, 2011 at 2:05 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> The most important feature that I rely zkclient on is to hide >> zkConnectionLoss exception (just block and retry when connection is in sync >> mode again). I assume that quite a few applications want something like >> that. Does it make sense for ZK to provide such functionality directly, >> instead of everyone implementing their own stuff? >>
-
Re: lost ZK events across datacentersTed Dunning 2011-06-01, 23:29
True.
On Wed, Jun 1, 2011 at 4:28 PM, Camille Fournier <[EMAIL PROTECTED]> wrote: > I'm sure there are cases where an incorrect retry is bad, but for at least > some use cases it is a trivial aspect of correctness and a worthwhile > tradeoff in terms of app simplicity. > > C > On Jun 1, 2011 7:22 PM, "Ted Dunning" <[EMAIL PROTECTED]> wrote: > > That is exactly the part of zkClient that I think is most subject to > error > > and is what I meant by inappropriate hiding of details. > > > > You can't just assume that you can retry an operation on Zookeeper and > get > > the right result. The correct handling is considerably more subtle. > Hiding > > that is not a good thing unless you say right up front that you are > > compromising either expressivity (as does Kept Collections) or > correctness > > (as does zkClient). > > > > On Wed, Jun 1, 2011 at 2:05 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > >> The most important feature that I rely zkclient on is to hide > >> zkConnectionLoss exception (just block and retry when connection is in > sync > >> mode again). I assume that quite a few applications want something like > >> that. Does it make sense for ZK to provide such functionality directly, > >> instead of everyone implementing their own stuff? > >> >
-
Re: lost ZK events across datacentersBenjamin Reed 2011-06-02, 17:49
can you tell us a bit more about the scenario? what was the call the
set the watch event? and what were the changes that caused the event? thanx ben On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > All my clients were on different machines. 2 of them got the watcher fired > about the same time. The third one never got the watcher triggered. > > Thanks, > > Jun > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > [EMAIL PROTECTED]> wrote: > >> All clients are in different processes? >> I've used zkclient and haven't seen any problems, but I haven't hammered it >> too hard yet. I took a long look at the code and didn't see any errors but >> there could always be something very subtle. >> >> -----Original Message----- >> From: Jun Rao [mailto:[EMAIL PROTECTED]] >> Sent: Wednesday, June 01, 2011 4:09 PM >> To: [EMAIL PROTECTED] >> Subject: Re: lost ZK events across datacenters >> >> I am using the zkclient package ( >> https://github.com/sgroschupf/zkclient.git). >> The watcher code seems reasonable. Basically, each watcher event is first >> added to a queue. A separate event thread dequeues each event and reads the >> children of a path (which re-registers the watcher) and invokes the >> registered listener. >> >> Anybody knows any issues in zkclient? >> >> Thanks, >> >> Jun >> >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> >> wrote: >> >> > This is most commonly due, in my own history of programming errors, to >> > writing code that has a race window in it. It is conceivable that cross >> > data-center operation would make such a race more of a problem. >> > >> > Can you say a bit about your code? Did you make sure to use standard >> > idioms >> > as opposed to setting the watch in a different call from reading the >> data? >> > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > >> > > Hi, >> > > >> > > I have a setup where multiple ZK clients are sitting in a different >> > > datacenter from the ZK server. All clients registered the same child >> > > watcher >> > > on a path. However, when the children of the path changed, the watcher >> on >> > 1 >> > > of the clients didn't fire. This seems to have happened a couple of >> times >> > > to >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter setup >> > and >> > > seen problems like that before? >> > > >> > > Thanks, >> > > >> > > Jun >> > > >> > >> >
-
Re: lost ZK events across datacentersJun Rao 2011-06-03, 08:40
Ben,
Some details below. The call that sets the watcher simple calls getChildren with watcher flag set to true. The triggering change is that one of the child nodes (which is ephemeral) is deleted because the creating client is gone. Thanks, Jun On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: > can you tell us a bit more about the scenario? what was the call the > set the watch event? and what were the changes that caused the event? > > thanx > ben > > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > All my clients were on different machines. 2 of them got the watcher > fired > > about the same time. The third one never got the watcher triggered. > > > > Thanks, > > > > Jun > > > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > > [EMAIL PROTECTED]> wrote: > > > >> All clients are in different processes? > >> I've used zkclient and haven't seen any problems, but I haven't hammered > it > >> too hard yet. I took a long look at the code and didn't see any errors > but > >> there could always be something very subtle. > >> > >> -----Original Message----- > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >> Sent: Wednesday, June 01, 2011 4:09 PM > >> To: [EMAIL PROTECTED] > >> Subject: Re: lost ZK events across datacenters > >> > >> I am using the zkclient package ( > >> https://github.com/sgroschupf/zkclient.git). > >> The watcher code seems reasonable. Basically, each watcher event is > first > >> added to a queue. A separate event thread dequeues each event and reads > the > >> children of a path (which re-registers the watcher) and invokes the > >> registered listener. > >> > >> Anybody knows any issues in zkclient? > >> > >> Thanks, > >> > >> Jun > >> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> > >> wrote: > >> > >> > This is most commonly due, in my own history of programming errors, to > >> > writing code that has a race window in it. It is conceivable that > cross > >> > data-center operation would make such a race more of a problem. > >> > > >> > Can you say a bit about your code? Did you make sure to use standard > >> > idioms > >> > as opposed to setting the watch in a different call from reading the > >> data? > >> > > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > > >> > > Hi, > >> > > > >> > > I have a setup where multiple ZK clients are sitting in a different > >> > > datacenter from the ZK server. All clients registered the same child > >> > > watcher > >> > > on a path. However, when the children of the path changed, the > watcher > >> on > >> > 1 > >> > > of the clients didn't fire. This seems to have happened a couple of > >> times > >> > > to > >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter > setup > >> > and > >> > > seen problems like that before? > >> > > > >> > > Thanks, > >> > > > >> > > Jun > >> > > > >> > > >> > > >
-
RE: lost ZK events across datacentersFournier, Camille F. [Tec... 2011-06-03, 16:32
Any state changes for the problem client between setting the watch and when you expected it to get called? Do you have logs for that client vs the others that show anything?
-----Original Message----- From: Jun Rao [mailto:[EMAIL PROTECTED]] Sent: Friday, June 03, 2011 4:40 AM To: [EMAIL PROTECTED] Subject: Re: lost ZK events across datacenters Ben, Some details below. The call that sets the watcher simple calls getChildren with watcher flag set to true. The triggering change is that one of the child nodes (which is ephemeral) is deleted because the creating client is gone. Thanks, Jun On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: > can you tell us a bit more about the scenario? what was the call the > set the watch event? and what were the changes that caused the event? > > thanx > ben > > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > All my clients were on different machines. 2 of them got the watcher > fired > > about the same time. The third one never got the watcher triggered. > > > > Thanks, > > > > Jun > > > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > > [EMAIL PROTECTED]> wrote: > > > >> All clients are in different processes? > >> I've used zkclient and haven't seen any problems, but I haven't hammered > it > >> too hard yet. I took a long look at the code and didn't see any errors > but > >> there could always be something very subtle. > >> > >> -----Original Message----- > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >> Sent: Wednesday, June 01, 2011 4:09 PM > >> To: [EMAIL PROTECTED] > >> Subject: Re: lost ZK events across datacenters > >> > >> I am using the zkclient package ( > >> https://github.com/sgroschupf/zkclient.git). > >> The watcher code seems reasonable. Basically, each watcher event is > first > >> added to a queue. A separate event thread dequeues each event and reads > the > >> children of a path (which re-registers the watcher) and invokes the > >> registered listener. > >> > >> Anybody knows any issues in zkclient? > >> > >> Thanks, > >> > >> Jun > >> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> > >> wrote: > >> > >> > This is most commonly due, in my own history of programming errors, to > >> > writing code that has a race window in it. It is conceivable that > cross > >> > data-center operation would make such a race more of a problem. > >> > > >> > Can you say a bit about your code? Did you make sure to use standard > >> > idioms > >> > as opposed to setting the watch in a different call from reading the > >> data? > >> > > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > > >> > > Hi, > >> > > > >> > > I have a setup where multiple ZK clients are sitting in a different > >> > > datacenter from the ZK server. All clients registered the same child > >> > > watcher > >> > > on a path. However, when the children of the path changed, the > watcher > >> on > >> > 1 > >> > > of the clients didn't fire. This seems to have happened a couple of > >> times > >> > > to > >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter > setup > >> > and > >> > > seen problems like that before? > >> > > > >> > > Thanks, > >> > > > >> > > Jun > >> > > > >> > > >> > > >
-
Re: lost ZK events across datacentersJun Rao 2011-06-03, 16:56
The log doesn't have any state changing entries around the time the watcher
is triggered, in all clients. Jun On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < [EMAIL PROTECTED]> wrote: > Any state changes for the problem client between setting the watch and when > you expected it to get called? Do you have logs for that client vs the > others that show anything? > > -----Original Message----- > From: Jun Rao [mailto:[EMAIL PROTECTED]] > Sent: Friday, June 03, 2011 4:40 AM > To: [EMAIL PROTECTED] > Subject: Re: lost ZK events across datacenters > > Ben, > > Some details below. > > The call that sets the watcher simple calls getChildren with watcher flag > set to true. The triggering change is that one of the child nodes (which is > ephemeral) is deleted because the creating client is gone. > > Thanks, > > Jun > > On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: > > > can you tell us a bit more about the scenario? what was the call the > > set the watch event? and what were the changes that caused the event? > > > > thanx > > ben > > > > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > All my clients were on different machines. 2 of them got the watcher > > fired > > > about the same time. The third one never got the watcher triggered. > > > > > > Thanks, > > > > > > Jun > > > > > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > > > [EMAIL PROTECTED]> wrote: > > > > > >> All clients are in different processes? > > >> I've used zkclient and haven't seen any problems, but I haven't > hammered > > it > > >> too hard yet. I took a long look at the code and didn't see any errors > > but > > >> there could always be something very subtle. > > >> > > >> -----Original Message----- > > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] > > >> Sent: Wednesday, June 01, 2011 4:09 PM > > >> To: [EMAIL PROTECTED] > > >> Subject: Re: lost ZK events across datacenters > > >> > > >> I am using the zkclient package ( > > >> https://github.com/sgroschupf/zkclient.git). > > >> The watcher code seems reasonable. Basically, each watcher event is > > first > > >> added to a queue. A separate event thread dequeues each event and > reads > > the > > >> children of a path (which re-registers the watcher) and invokes the > > >> registered listener. > > >> > > >> Anybody knows any issues in zkclient? > > >> > > >> Thanks, > > >> > > >> Jun > > >> > > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> > > >> wrote: > > >> > > >> > This is most commonly due, in my own history of programming errors, > to > > >> > writing code that has a race window in it. It is conceivable that > > cross > > >> > data-center operation would make such a race more of a problem. > > >> > > > >> > Can you say a bit about your code? Did you make sure to use > standard > > >> > idioms > > >> > as opposed to setting the watch in a different call from reading the > > >> data? > > >> > > > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > >> > > > >> > > Hi, > > >> > > > > >> > > I have a setup where multiple ZK clients are sitting in a > different > > >> > > datacenter from the ZK server. All clients registered the same > child > > >> > > watcher > > >> > > on a path. However, when the children of the path changed, the > > watcher > > >> on > > >> > 1 > > >> > > of the clients didn't fire. This seems to have happened a couple > of > > >> times > > >> > > to > > >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter > > setup > > >> > and > > >> > > seen problems like that before? > > >> > > > > >> > > Thanks, > > >> > > > > >> > > Jun > > >> > > > > >> > > > >> > > > > > >
-
Re: lost ZK events across datacentersJun Rao 2011-06-03, 16:59
I don't expect that we can discover the problem right now. However, what are
the things that I can do to collect enough tracing should the problem occur again in the future (e.g., is INFO level logging enough)? Thanks, Jun On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > The log doesn't have any state changing entries around the time the watcher > is triggered, in all clients. > > Jun > > > On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < > [EMAIL PROTECTED]> wrote: > >> Any state changes for the problem client between setting the watch and >> when you expected it to get called? Do you have logs for that client vs the >> others that show anything? >> >> -----Original Message----- >> From: Jun Rao [mailto:[EMAIL PROTECTED]] >> Sent: Friday, June 03, 2011 4:40 AM >> To: [EMAIL PROTECTED] >> Subject: Re: lost ZK events across datacenters >> >> Ben, >> >> Some details below. >> >> The call that sets the watcher simple calls getChildren with watcher flag >> set to true. The triggering change is that one of the child nodes (which >> is >> ephemeral) is deleted because the creating client is gone. >> >> Thanks, >> >> Jun >> >> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: >> >> > can you tell us a bit more about the scenario? what was the call the >> > set the watch event? and what were the changes that caused the event? >> > >> > thanx >> > ben >> > >> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > > All my clients were on different machines. 2 of them got the watcher >> > fired >> > > about the same time. The third one never got the watcher triggered. >> > > >> > > Thanks, >> > > >> > > Jun >> > > >> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < >> > > [EMAIL PROTECTED]> wrote: >> > > >> > >> All clients are in different processes? >> > >> I've used zkclient and haven't seen any problems, but I haven't >> hammered >> > it >> > >> too hard yet. I took a long look at the code and didn't see any >> errors >> > but >> > >> there could always be something very subtle. >> > >> >> > >> -----Original Message----- >> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] >> > >> Sent: Wednesday, June 01, 2011 4:09 PM >> > >> To: [EMAIL PROTECTED] >> > >> Subject: Re: lost ZK events across datacenters >> > >> >> > >> I am using the zkclient package ( >> > >> https://github.com/sgroschupf/zkclient.git). >> > >> The watcher code seems reasonable. Basically, each watcher event is >> > first >> > >> added to a queue. A separate event thread dequeues each event and >> reads >> > the >> > >> children of a path (which re-registers the watcher) and invokes the >> > >> registered listener. >> > >> >> > >> Anybody knows any issues in zkclient? >> > >> >> > >> Thanks, >> > >> >> > >> Jun >> > >> >> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> >> > >> wrote: >> > >> >> > >> > This is most commonly due, in my own history of programming errors, >> to >> > >> > writing code that has a race window in it. It is conceivable that >> > cross >> > >> > data-center operation would make such a race more of a problem. >> > >> > >> > >> > Can you say a bit about your code? Did you make sure to use >> standard >> > >> > idioms >> > >> > as opposed to setting the watch in a different call from reading >> the >> > >> data? >> > >> > >> > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > >> > >> > >> > > Hi, >> > >> > > >> > >> > > I have a setup where multiple ZK clients are sitting in a >> different >> > >> > > datacenter from the ZK server. All clients registered the same >> child >> > >> > > watcher >> > >> > > on a path. However, when the children of the path changed, the >> > watcher >> > >> on >> > >> > 1 >> > >> > > of the clients didn't fire. This seems to have happened a couple >> of >> > >> times >> > >> > > to >> > >> > > me. I am using ZK 3.3.3. Has anyone used ZK in a cross datacenter
-
Re: lost ZK events across datacentersBenjamin Reed 2011-06-03, 17:18
actually, i think the transaction log could help a lot, and that will
always be there. two scenarios i can think of are: 1) the change happened before the watch was set 2) the change never got there you could get an answer to both of those questions by looking at the transaction log. ben On Fri, Jun 3, 2011 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > I don't expect that we can discover the problem right now. However, what are > the things that I can do to collect enough tracing should the problem occur > again in the future (e.g., is INFO level logging enough)? > > Thanks, > > Jun > > On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> The log doesn't have any state changing entries around the time the watcher >> is triggered, in all clients. >> >> Jun >> >> >> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < >> [EMAIL PROTECTED]> wrote: >> >>> Any state changes for the problem client between setting the watch and >>> when you expected it to get called? Do you have logs for that client vs the >>> others that show anything? >>> >>> -----Original Message----- >>> From: Jun Rao [mailto:[EMAIL PROTECTED]] >>> Sent: Friday, June 03, 2011 4:40 AM >>> To: [EMAIL PROTECTED] >>> Subject: Re: lost ZK events across datacenters >>> >>> Ben, >>> >>> Some details below. >>> >>> The call that sets the watcher simple calls getChildren with watcher flag >>> set to true. The triggering change is that one of the child nodes (which >>> is >>> ephemeral) is deleted because the creating client is gone. >>> >>> Thanks, >>> >>> Jun >>> >>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: >>> >>> > can you tell us a bit more about the scenario? what was the call the >>> > set the watch event? and what were the changes that caused the event? >>> > >>> > thanx >>> > ben >>> > >>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: >>> > > All my clients were on different machines. 2 of them got the watcher >>> > fired >>> > > about the same time. The third one never got the watcher triggered. >>> > > >>> > > Thanks, >>> > > >>> > > Jun >>> > > >>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < >>> > > [EMAIL PROTECTED]> wrote: >>> > > >>> > >> All clients are in different processes? >>> > >> I've used zkclient and haven't seen any problems, but I haven't >>> hammered >>> > it >>> > >> too hard yet. I took a long look at the code and didn't see any >>> errors >>> > but >>> > >> there could always be something very subtle. >>> > >> >>> > >> -----Original Message----- >>> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] >>> > >> Sent: Wednesday, June 01, 2011 4:09 PM >>> > >> To: [EMAIL PROTECTED] >>> > >> Subject: Re: lost ZK events across datacenters >>> > >> >>> > >> I am using the zkclient package ( >>> > >> https://github.com/sgroschupf/zkclient.git). >>> > >> The watcher code seems reasonable. Basically, each watcher event is >>> > first >>> > >> added to a queue. A separate event thread dequeues each event and >>> reads >>> > the >>> > >> children of a path (which re-registers the watcher) and invokes the >>> > >> registered listener. >>> > >> >>> > >> Anybody knows any issues in zkclient? >>> > >> >>> > >> Thanks, >>> > >> >>> > >> Jun >>> > >> >>> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> >>> > >> wrote: >>> > >> >>> > >> > This is most commonly due, in my own history of programming errors, >>> to >>> > >> > writing code that has a race window in it. It is conceivable that >>> > cross >>> > >> > data-center operation would make such a race more of a problem. >>> > >> > >>> > >> > Can you say a bit about your code? Did you make sure to use >>> standard >>> > >> > idioms >>> > >> > as opposed to setting the watch in a different call from reading >>> the >>> > >> data? >>> > >> > >>> > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >>> > >> > >>> > >> > > Hi, >>> > >> > > >>
-
RE: lost ZK events across datacentersFournier, Camille F. [Tec... 2011-06-06, 19:13
Hey Jun, question: What version of Java are your clients running? I keep hitting a bug in my java5 test suite and I'm wondering if in fact I am seeing the same problem you're reporting here.
C -----Original Message----- From: Jun Rao [mailto:[EMAIL PROTECTED]] Sent: Friday, June 03, 2011 12:59 PM To: [EMAIL PROTECTED] Subject: Re: lost ZK events across datacenters I don't expect that we can discover the problem right now. However, what are the things that I can do to collect enough tracing should the problem occur again in the future (e.g., is INFO level logging enough)? Thanks, Jun On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > The log doesn't have any state changing entries around the time the watcher > is triggered, in all clients. > > Jun > > > On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < > [EMAIL PROTECTED]> wrote: > >> Any state changes for the problem client between setting the watch and >> when you expected it to get called? Do you have logs for that client vs the >> others that show anything? >> >> -----Original Message----- >> From: Jun Rao [mailto:[EMAIL PROTECTED]] >> Sent: Friday, June 03, 2011 4:40 AM >> To: [EMAIL PROTECTED] >> Subject: Re: lost ZK events across datacenters >> >> Ben, >> >> Some details below. >> >> The call that sets the watcher simple calls getChildren with watcher flag >> set to true. The triggering change is that one of the child nodes (which >> is >> ephemeral) is deleted because the creating client is gone. >> >> Thanks, >> >> Jun >> >> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: >> >> > can you tell us a bit more about the scenario? what was the call the >> > set the watch event? and what were the changes that caused the event? >> > >> > thanx >> > ben >> > >> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > > All my clients were on different machines. 2 of them got the watcher >> > fired >> > > about the same time. The third one never got the watcher triggered. >> > > >> > > Thanks, >> > > >> > > Jun >> > > >> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < >> > > [EMAIL PROTECTED]> wrote: >> > > >> > >> All clients are in different processes? >> > >> I've used zkclient and haven't seen any problems, but I haven't >> hammered >> > it >> > >> too hard yet. I took a long look at the code and didn't see any >> errors >> > but >> > >> there could always be something very subtle. >> > >> >> > >> -----Original Message----- >> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] >> > >> Sent: Wednesday, June 01, 2011 4:09 PM >> > >> To: [EMAIL PROTECTED] >> > >> Subject: Re: lost ZK events across datacenters >> > >> >> > >> I am using the zkclient package ( >> > >> https://github.com/sgroschupf/zkclient.git). >> > >> The watcher code seems reasonable. Basically, each watcher event is >> > first >> > >> added to a queue. A separate event thread dequeues each event and >> reads >> > the >> > >> children of a path (which re-registers the watcher) and invokes the >> > >> registered listener. >> > >> >> > >> Anybody knows any issues in zkclient? >> > >> >> > >> Thanks, >> > >> >> > >> Jun >> > >> >> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <[EMAIL PROTECTED]> >> > >> wrote: >> > >> >> > >> > This is most commonly due, in my own history of programming errors, >> to >> > >> > writing code that has a race window in it. It is conceivable that >> > cross >> > >> > data-center operation would make such a race more of a problem. >> > >> > >> > >> > Can you say a bit about your code? Did you make sure to use >> standard >> > >> > idioms >> > >> > as opposed to setting the watch in a different call from reading >> the >> > >> data? >> > >> > >> > >> > On Wed, Jun 1, 2011 at 11:40 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > >> > >> > >> > > Hi, >> > >> > > >> > >> > > I have a setup where multiple ZK clients are sitting in a >> different >> > >> > > datacenter from the ZK server. All clients registered the same
-
Re: lost ZK events across datacentersJun Rao 2011-06-08, 20:51
We are on java 6.
Jun On Mon, Jun 6, 2011 at 12:13 PM, Fournier, Camille F. [Tech] < [EMAIL PROTECTED]> wrote: > Hey Jun, question: What version of Java are your clients running? I keep > hitting a bug in my java5 test suite and I'm wondering if in fact I am > seeing the same problem you're reporting here. > > C > > -----Original Message----- > From: Jun Rao [mailto:[EMAIL PROTECTED]] > Sent: Friday, June 03, 2011 12:59 PM > To: [EMAIL PROTECTED] > Subject: Re: lost ZK events across datacenters > > I don't expect that we can discover the problem right now. However, what > are > the things that I can do to collect enough tracing should the problem occur > again in the future (e.g., is INFO level logging enough)? > > Thanks, > > Jun > > On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > The log doesn't have any state changing entries around the time the > watcher > > is triggered, in all clients. > > > > Jun > > > > > > On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < > > [EMAIL PROTECTED]> wrote: > > > >> Any state changes for the problem client between setting the watch and > >> when you expected it to get called? Do you have logs for that client vs > the > >> others that show anything? > >> > >> -----Original Message----- > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >> Sent: Friday, June 03, 2011 4:40 AM > >> To: [EMAIL PROTECTED] > >> Subject: Re: lost ZK events across datacenters > >> > >> Ben, > >> > >> Some details below. > >> > >> The call that sets the watcher simple calls getChildren with watcher > flag > >> set to true. The triggering change is that one of the child nodes (which > >> is > >> ephemeral) is deleted because the creating client is gone. > >> > >> Thanks, > >> > >> Jun > >> > >> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> > wrote: > >> > >> > can you tell us a bit more about the scenario? what was the call the > >> > set the watch event? and what were the changes that caused the event? > >> > > >> > thanx > >> > ben > >> > > >> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > > All my clients were on different machines. 2 of them got the watcher > >> > fired > >> > > about the same time. The third one never got the watcher triggered. > >> > > > >> > > Thanks, > >> > > > >> > > Jun > >> > > > >> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > >> > > [EMAIL PROTECTED]> wrote: > >> > > > >> > >> All clients are in different processes? > >> > >> I've used zkclient and haven't seen any problems, but I haven't > >> hammered > >> > it > >> > >> too hard yet. I took a long look at the code and didn't see any > >> errors > >> > but > >> > >> there could always be something very subtle. > >> > >> > >> > >> -----Original Message----- > >> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >> > >> Sent: Wednesday, June 01, 2011 4:09 PM > >> > >> To: [EMAIL PROTECTED] > >> > >> Subject: Re: lost ZK events across datacenters > >> > >> > >> > >> I am using the zkclient package ( > >> > >> https://github.com/sgroschupf/zkclient.git). > >> > >> The watcher code seems reasonable. Basically, each watcher event is > >> > first > >> > >> added to a queue. A separate event thread dequeues each event and > >> reads > >> > the > >> > >> children of a path (which re-registers the watcher) and invokes the > >> > >> registered listener. > >> > >> > >> > >> Anybody knows any issues in zkclient? > >> > >> > >> > >> Thanks, > >> > >> > >> > >> Jun > >> > >> > >> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning < > [EMAIL PROTECTED]> > >> > >> wrote: > >> > >> > >> > >> > This is most commonly due, in my own history of programming > errors, > >> to > >> > >> > writing code that has a race window in it. It is conceivable > that > >> > cross > >> > >> > data-center operation would make such a race more of a problem. > >> > >> > > >> > >> > Can you say a bit about your code? Did you make sure to use
-
Re: lost ZK events across datacentersJun Rao 2011-06-08, 20:55
Ben,
The log is binary. Is there a log reader? Also, can I just look at the log on any zookeeper server? Thanks, Jun On Fri, Jun 3, 2011 at 10:18 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: > actually, i think the transaction log could help a lot, and that will > always be there. two scenarios i can think of are: > 1) the change happened before the watch was set > 2) the change never got there > you could get an answer to both of those questions by looking at the > transaction log. > > ben > > On Fri, Jun 3, 2011 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > I don't expect that we can discover the problem right now. However, what > are > > the things that I can do to collect enough tracing should the problem > occur > > again in the future (e.g., is INFO level logging enough)? > > > > Thanks, > > > > Jun > > > > On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > >> The log doesn't have any state changing entries around the time the > watcher > >> is triggered, in all clients. > >> > >> Jun > >> > >> > >> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < > >> [EMAIL PROTECTED]> wrote: > >> > >>> Any state changes for the problem client between setting the watch and > >>> when you expected it to get called? Do you have logs for that client vs > the > >>> others that show anything? > >>> > >>> -----Original Message----- > >>> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >>> Sent: Friday, June 03, 2011 4:40 AM > >>> To: [EMAIL PROTECTED] > >>> Subject: Re: lost ZK events across datacenters > >>> > >>> Ben, > >>> > >>> Some details below. > >>> > >>> The call that sets the watcher simple calls getChildren with watcher > flag > >>> set to true. The triggering change is that one of the child nodes > (which > >>> is > >>> ephemeral) is deleted because the creating client is gone. > >>> > >>> Thanks, > >>> > >>> Jun > >>> > >>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> > wrote: > >>> > >>> > can you tell us a bit more about the scenario? what was the call the > >>> > set the watch event? and what were the changes that caused the event? > >>> > > >>> > thanx > >>> > ben > >>> > > >>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > >>> > > All my clients were on different machines. 2 of them got the > watcher > >>> > fired > >>> > > about the same time. The third one never got the watcher triggered. > >>> > > > >>> > > Thanks, > >>> > > > >>> > > Jun > >>> > > > >>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > >>> > > [EMAIL PROTECTED]> wrote: > >>> > > > >>> > >> All clients are in different processes? > >>> > >> I've used zkclient and haven't seen any problems, but I haven't > >>> hammered > >>> > it > >>> > >> too hard yet. I took a long look at the code and didn't see any > >>> errors > >>> > but > >>> > >> there could always be something very subtle. > >>> > >> > >>> > >> -----Original Message----- > >>> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >>> > >> Sent: Wednesday, June 01, 2011 4:09 PM > >>> > >> To: [EMAIL PROTECTED] > >>> > >> Subject: Re: lost ZK events across datacenters > >>> > >> > >>> > >> I am using the zkclient package ( > >>> > >> https://github.com/sgroschupf/zkclient.git). > >>> > >> The watcher code seems reasonable. Basically, each watcher event > is > >>> > first > >>> > >> added to a queue. A separate event thread dequeues each event and > >>> reads > >>> > the > >>> > >> children of a path (which re-registers the watcher) and invokes > the > >>> > >> registered listener. > >>> > >> > >>> > >> Anybody knows any issues in zkclient? > >>> > >> > >>> > >> Thanks, > >>> > >> > >>> > >> Jun > >>> > >> > >>> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning < > [EMAIL PROTECTED]> > >>> > >> wrote: > >>> > >> > >>> > >> > This is most commonly due, in my own history of programming > errors, > >>> to > >>> > >> > writing code that has a race window in it. It is conceivable > that
-
Re: lost ZK events across datacentersBenjamin Reed 2011-06-08, 21:02
yes, the LogFormatter class will do it for me.
ben On Wed, Jun 8, 2011 at 1:55 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > Ben, > > The log is binary. Is there a log reader? Also, can I just look at the log > on any zookeeper server? > > Thanks, > > Jun > > On Fri, Jun 3, 2011 at 10:18 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: > >> actually, i think the transaction log could help a lot, and that will >> always be there. two scenarios i can think of are: >> 1) the change happened before the watch was set >> 2) the change never got there >> you could get an answer to both of those questions by looking at the >> transaction log. >> >> ben >> >> On Fri, Jun 3, 2011 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > I don't expect that we can discover the problem right now. However, what >> are >> > the things that I can do to collect enough tracing should the problem >> occur >> > again in the future (e.g., is INFO level logging enough)? >> > >> > Thanks, >> > >> > Jun >> > >> > On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote: >> > >> >> The log doesn't have any state changing entries around the time the >> watcher >> >> is triggered, in all clients. >> >> >> >> Jun >> >> >> >> >> >> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < >> >> [EMAIL PROTECTED]> wrote: >> >> >> >>> Any state changes for the problem client between setting the watch and >> >>> when you expected it to get called? Do you have logs for that client vs >> the >> >>> others that show anything? >> >>> >> >>> -----Original Message----- >> >>> From: Jun Rao [mailto:[EMAIL PROTECTED]] >> >>> Sent: Friday, June 03, 2011 4:40 AM >> >>> To: [EMAIL PROTECTED] >> >>> Subject: Re: lost ZK events across datacenters >> >>> >> >>> Ben, >> >>> >> >>> Some details below. >> >>> >> >>> The call that sets the watcher simple calls getChildren with watcher >> flag >> >>> set to true. The triggering change is that one of the child nodes >> (which >> >>> is >> >>> ephemeral) is deleted because the creating client is gone. >> >>> >> >>> Thanks, >> >>> >> >>> Jun >> >>> >> >>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> >> wrote: >> >>> >> >>> > can you tell us a bit more about the scenario? what was the call the >> >>> > set the watch event? and what were the changes that caused the event? >> >>> > >> >>> > thanx >> >>> > ben >> >>> > >> >>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: >> >>> > > All my clients were on different machines. 2 of them got the >> watcher >> >>> > fired >> >>> > > about the same time. The third one never got the watcher triggered. >> >>> > > >> >>> > > Thanks, >> >>> > > >> >>> > > Jun >> >>> > > >> >>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < >> >>> > > [EMAIL PROTECTED]> wrote: >> >>> > > >> >>> > >> All clients are in different processes? >> >>> > >> I've used zkclient and haven't seen any problems, but I haven't >> >>> hammered >> >>> > it >> >>> > >> too hard yet. I took a long look at the code and didn't see any >> >>> errors >> >>> > but >> >>> > >> there could always be something very subtle. >> >>> > >> >> >>> > >> -----Original Message----- >> >>> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] >> >>> > >> Sent: Wednesday, June 01, 2011 4:09 PM >> >>> > >> To: [EMAIL PROTECTED] >> >>> > >> Subject: Re: lost ZK events across datacenters >> >>> > >> >> >>> > >> I am using the zkclient package ( >> >>> > >> https://github.com/sgroschupf/zkclient.git). >> >>> > >> The watcher code seems reasonable. Basically, each watcher event >> is >> >>> > first >> >>> > >> added to a queue. A separate event thread dequeues each event and >> >>> reads >> >>> > the >> >>> > >> children of a path (which re-registers the watcher) and invokes >> the >> >>> > >> registered listener. >> >>> > >> >> >>> > >> Anybody knows any issues in zkclient? >> >>> > >> >> >>> > >> Thanks, >> >>> > >> >> >>> > >> Jun >> >>> > >> >> >>> > >> On Wed, Jun 1, 2011 at 12:04 PM, Ted Dunning <
-
Re: lost ZK events across datacentersJun Rao 2011-06-10, 06:02
Hmm, those logs are pretty big, there is a 67MB file per hour.
Jun On Wed, Jun 8, 2011 at 2:02 PM, Benjamin Reed <[EMAIL PROTECTED]> wrote: > yes, the LogFormatter class will do it for me. > > ben > > On Wed, Jun 8, 2011 at 1:55 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > Ben, > > > > The log is binary. Is there a log reader? Also, can I just look at the > log > > on any zookeeper server? > > > > Thanks, > > > > Jun > > > > On Fri, Jun 3, 2011 at 10:18 AM, Benjamin Reed <[EMAIL PROTECTED]> wrote: > > > >> actually, i think the transaction log could help a lot, and that will > >> always be there. two scenarios i can think of are: > >> 1) the change happened before the watch was set > >> 2) the change never got there > >> you could get an answer to both of those questions by looking at the > >> transaction log. > >> > >> ben > >> > >> On Fri, Jun 3, 2011 at 9:59 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > I don't expect that we can discover the problem right now. However, > what > >> are > >> > the things that I can do to collect enough tracing should the problem > >> occur > >> > again in the future (e.g., is INFO level logging enough)? > >> > > >> > Thanks, > >> > > >> > Jun > >> > > >> > On Fri, Jun 3, 2011 at 9:56 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> > > >> >> The log doesn't have any state changing entries around the time the > >> watcher > >> >> is triggered, in all clients. > >> >> > >> >> Jun > >> >> > >> >> > >> >> On Fri, Jun 3, 2011 at 9:32 AM, Fournier, Camille F. [Tech] < > >> >> [EMAIL PROTECTED]> wrote: > >> >> > >> >>> Any state changes for the problem client between setting the watch > and > >> >>> when you expected it to get called? Do you have logs for that client > vs > >> the > >> >>> others that show anything? > >> >>> > >> >>> -----Original Message----- > >> >>> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >> >>> Sent: Friday, June 03, 2011 4:40 AM > >> >>> To: [EMAIL PROTECTED] > >> >>> Subject: Re: lost ZK events across datacenters > >> >>> > >> >>> Ben, > >> >>> > >> >>> Some details below. > >> >>> > >> >>> The call that sets the watcher simple calls getChildren with watcher > >> flag > >> >>> set to true. The triggering change is that one of the child nodes > >> (which > >> >>> is > >> >>> ephemeral) is deleted because the creating client is gone. > >> >>> > >> >>> Thanks, > >> >>> > >> >>> Jun > >> >>> > >> >>> On Thu, Jun 2, 2011 at 10:49 AM, Benjamin Reed <[EMAIL PROTECTED]> > >> wrote: > >> >>> > >> >>> > can you tell us a bit more about the scenario? what was the call > the > >> >>> > set the watch event? and what were the changes that caused the > event? > >> >>> > > >> >>> > thanx > >> >>> > ben > >> >>> > > >> >>> > On Wed, Jun 1, 2011 at 3:14 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > >> >>> > > All my clients were on different machines. 2 of them got the > >> watcher > >> >>> > fired > >> >>> > > about the same time. The third one never got the watcher > triggered. > >> >>> > > > >> >>> > > Thanks, > >> >>> > > > >> >>> > > Jun > >> >>> > > > >> >>> > > On Wed, Jun 1, 2011 at 2:18 PM, Fournier, Camille F. [Tech] < > >> >>> > > [EMAIL PROTECTED]> wrote: > >> >>> > > > >> >>> > >> All clients are in different processes? > >> >>> > >> I've used zkclient and haven't seen any problems, but I haven't > >> >>> hammered > >> >>> > it > >> >>> > >> too hard yet. I took a long look at the code and didn't see any > >> >>> errors > >> >>> > but > >> >>> > >> there could always be something very subtle. > >> >>> > >> > >> >>> > >> -----Original Message----- > >> >>> > >> From: Jun Rao [mailto:[EMAIL PROTECTED]] > >> >>> > >> Sent: Wednesday, June 01, 2011 4:09 PM > >> >>> > >> To: [EMAIL PROTECTED] > >> >>> > >> Subject: Re: lost ZK events across datacenters > >> >>> > >> > >> >>> > >> I am using the zkclient package ( > >> >>> > >> https://github.com/sgroschupf/zkclient.git). > >> >>> > >> The watcher code seems reasonable. Basically, each watcher > event > >> is > >> >>> > first
-
Re: lost ZK events across datacentersTed Dunning 2011-06-10, 06:09
Yes.
Don't attach them to the email. You should still be able to get useful information from them. On Fri, Jun 10, 2011 at 8:02 AM, Jun Rao <[EMAIL PROTECTED]> wrote: > Hmm, those logs are pretty big, there is a 67MB file per hour. |