I've created https://issues.apache.org/jira/browse/ZOOKEEPER-1412 for this.
I've also debugged in further and found that this issue is non-existant in
C so there have to be a client side difference that fixes this.
What I found that c client updates the lastxzid on every server response
but java doesn't update on ping, auth or notification. I think the most
trivial fix would be to change the java client to update lastzxid also on
those responses. Especially updating on ping looks promising because it is
received very often and I've checked sever side that the zxid is updated
Do you see any side effects or downside of this change.
On Fri, Mar 9, 2012 at 18:30, Patrick Hunt <[EMAIL PROTECTED]> wrote:
> Hi Botond, great detective work. I believe you are correct. In
> FinalRequestProcessor we do:
> > ReplyHeader hdr = new ReplyHeader(request.cxid, request.zxid,
> The request zxid is updated on a write, but not a read. So if you
> connect a client, set a watch, then do no writes, then the client is
> disconnected and gets reconnected the zxid it sends to the new server
> will still be 0.
> Notice that ConnectResponse does not include the latest zxid, so
> really the write is the time that we first send the server's current
> zxid to the client. At first look it seems that we should be sending
> the servers's zxid back to the client, regardless of read or write.
> Should be a simple fix (ha!).
> Can you enter a jira for this? Thanks.
> On Fri, Mar 9, 2012 at 3:08 AM, Botond Hejj
> <[EMAIL PROTECTED]> wrote:
> > Hi,
> > I've observed an inconsistent behavior in java client watches. The
> > inconsistency relates to the behavior after the client reconnects to the
> > zookeeper ensemble.
> > The documentation is not completely clear for me on this but if I am not
> > mistaken than after the client reconnects to the ensemble only those
> > watches should trigger which should have been triggered also if the
> > connections was not lost. This means if I watch for changes in node /foo
> > and there is no change there than my watch should not be triggered on
> > reconnecting to the ensemble.
> > This is not always the case in the java client.
> > I've debugged the issues and I could locate the case when the watch is
> > always triggered on reconnect. This is consistently happening if I
> > to a follower in the ensemble and I don't do any operation which goes
> > through the leader.
> > Looking at the code I see that the client stores the lastzxid and sends
> > that with its request. This is 0 on startup and will be updated everytime
> > from the server replies. This lastzxid is also sent to the server after
> > reconnect together with watches. The server decides which watch to
> > based on this lastzxid probably because that should mean the last known
> > state of the client. If this lastzxid is 0 than all the watches are
> > triggered.
> > I've checked why is this lastzxid 0. I thought it shouldn't be since
> > was already a request to the server to set the watch and in the reply the
> > server could have sent back the zxid but it turns out that it sends just
> > Looking at the server code I see that for requests which doesn't go
> > the leader the follower server just sends back the same zxid that the
> > client sent.
> > Could anyone who is more familiar with the codebase comment on this. I
> > think this is bug and doesn't seems to be a straightforward way to fix
> > (I've done most of the tests with 3.3.3 server/client but this seems to
> > the case in other versions)
> > Regards,
> > Botond Hejj
> > Morgan Stanley | Technology
> > Lechner Odon fasor 8 | Floor 07
> > Budapest, 1095
> > Phone: +36 1 881-3962
> > [EMAIL PROTECTED]
Morgan Stanley | Technology
Lechner Odon fasor 8 | Floor 07
Phone: +36 1 881-3962