|
|
-
Re: puzzling BadVersionExceptionIshaaq Chandy 2011-10-11, 05:55
Ok, false alarm - the problem was a mis-configuration in our code that was
causing multiple processes to update that znode whereas only one should have. Apologies for wasting your time. Ishaaq On 11 October 2011 13:09, Ishaaq Chandy <[EMAIL PROTECTED]> wrote: > Technically we don't need the contents as we're going to overwrite it > anyway, we're just asserting the fact that we're the only one writing to > that node. > > Was just checking if it is a known issue - clearly not, so I'll continue > investigating our code. > > Thanks, > Ishaaq > > > On 11 October 2011 12:21, Ted Dunning <[EMAIL PROTECTED]> wrote: > >> Why do you get the version in the first place without getting the >> contents? >> >> If you don't have the contents, what is the point of enforcing a version. >> >> On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <[EMAIL PROTECTED]> wrote: >> >> > Thanks Mahadev, >> > Yup, I am aware of the fact that 2 is a particularly bad number for >> cluster >> > size and hopefully we should fix that soon, I was just hoping that for >> some >> > reason that was why the problem is occurring - my conjecture was, for >> e.g. >> > if the two zk servers disagree about the version there is no way to >> decide >> > who is correct without a third tie-breaker server. >> > >> > But, if you say that is not the case, then I need to keep looking >> (sigh). >> > >> > I am pretty sure that only one thread is touching that znode. We put in >> > some >> > trace logging to try and pinpoint the problem and noticed that every >> time >> > we >> > get the BadVersionException the actual version on the znode is one more >> > than >> > what we expected it to be based on the previous "exists()" call. >> > >> > As I said, this code gets called once every 2 seconds (or thereabouts). >> It >> > seems to fail with a BadVersionException about 3 times an hour (on >> > average). >> > >> > By the way, not sure if it is relevant, but the reason we are using 2 >> nodes >> > in the cluster and the reason why their version is 3.2.2 is because they >> > are >> > the ZKs that come embedded inside HBase (we're running 2 Hbase >> > regionservers) - I've been meaning to pull them out and run them >> standalone >> > but just haven't got around to it (yet). >> > >> > Ishaaq >> > >> > On 10 October 2011 17:35, Mahadev Konar <[EMAIL PROTECTED]> >> wrote: >> > >> > > Ishaaq, >> > > 2 ZK servers is definitely not the right number for running a ZK >> > > service but its no reason to get a Badversion exception because of >> > > that. For more information on the size of the ZK ensemble take a look >> > > at: >> > > >> > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html >> > > >> > > As for the version on the znode, can you try reading the version when >> > > you get a setData/BadException? >> > > >> > > Also, is there any chance of a delete on the znode that removes it and >> > > another create happens for the same path? >> > > >> > > I dont think we have seen this version issue in the releases, so I'd >> > > be inclined to say that there could be something in the code thats >> > > making some changes to the znode before you set the data. >> > > >> > > Hope that helps >> > > thanks >> > > mahadev >> > > >> > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <[EMAIL PROTECTED]> >> wrote: >> > > > Hi all, >> > > > >> > > > We're seeing a puzzling error. Here's the scenario: >> > > > >> > > > 1. We have a single thread that wakes up every two seconds (give or >> > take) >> > > > and does some work >> > > > 2. As part of that work it updates a node on ZK. When it does this >> it >> > > first >> > > > gets the Stat of the existing node and uses the version retrieved >> from >> > it >> > > to >> > > > update the value. >> > > > 3. There are no other processes updating the node >> > > > >> > > > The code goes something like this: >> > > > final Stat stat = zooKeeper.exists(path, false); >> > > > // do some other work here to create the path if it does not exist - >> > this |