Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - puzzling BadVersionException


Copy link to this message
-
Re: puzzling BadVersionException
Ishaaq Chandy 2011-10-11, 05:55
Ok, false alarm - the problem was a mis-configuration in our code that was
causing multiple processes to update that znode whereas only one should
have.

Apologies for wasting your time.

Ishaaq

On 11 October 2011 13:09, Ishaaq Chandy <[EMAIL PROTECTED]> wrote:

> Technically we don't need the contents as we're going to overwrite it
> anyway, we're just asserting the fact that we're the only one writing to
> that node.
>
> Was just checking if it is a known issue - clearly not, so I'll continue
> investigating our code.
>
> Thanks,
> Ishaaq
>
>
> On 11 October 2011 12:21, Ted Dunning <[EMAIL PROTECTED]> wrote:
>
>> Why do you get the version in the first place without getting the
>> contents?
>>
>> If you don't have the contents, what is the point of enforcing a version.
>>
>> On Mon, Oct 10, 2011 at 8:26 AM, Ishaaq Chandy <[EMAIL PROTECTED]> wrote:
>>
>> > Thanks Mahadev,
>> > Yup, I am aware of the fact that 2 is a particularly bad number for
>> cluster
>> > size and hopefully we should fix that soon, I was just hoping that for
>> some
>> > reason that was why the problem is occurring - my conjecture was, for
>> e.g.
>> > if the two zk servers disagree about the version there is no way to
>> decide
>> > who is correct without a third tie-breaker server.
>> >
>> > But, if you say that is not the case, then I need to keep looking
>> (sigh).
>> >
>> > I am pretty sure that only one thread is touching that znode. We put in
>> > some
>> > trace logging to try and pinpoint the problem and noticed that every
>> time
>> > we
>> > get the BadVersionException the actual version on the znode is one more
>> > than
>> > what we expected it to be based on the previous "exists()" call.
>> >
>> > As I said, this code gets called once every 2 seconds (or thereabouts).
>> It
>> > seems to fail with a BadVersionException about 3 times an hour (on
>> > average).
>> >
>> > By the way, not sure if it is relevant, but the reason we are using 2
>> nodes
>> > in the cluster and the reason why their version is 3.2.2 is because they
>> > are
>> > the ZKs that come embedded inside HBase (we're running 2 Hbase
>> > regionservers) - I've been meaning to pull them out and run them
>> standalone
>> > but just haven't got around to it (yet).
>> >
>> > Ishaaq
>> >
>> > On 10 October 2011 17:35, Mahadev Konar <[EMAIL PROTECTED]>
>> wrote:
>> >
>> > > Ishaaq,
>> > >  2 ZK servers is definitely not the right number for running a ZK
>> > > service but its no reason to get a Badversion exception because of
>> > > that. For more information on the size of the ZK ensemble take a look
>> > > at:
>> > >
>> > > http://zookeeper.apache.org/doc/r3.3.3/zookeeperAdmin.html
>> > >
>> > > As for the version on the znode, can you try reading the version when
>> > > you get a setData/BadException?
>> > >
>> > > Also, is there any chance of a delete on the znode that removes it and
>> > > another create happens for the same path?
>> > >
>> > > I dont think we have seen this version issue in the releases, so I'd
>> > > be inclined to say that there could be something in the code thats
>> > > making some changes to the znode before you set the data.
>> > >
>> > > Hope that helps
>> > > thanks
>> > > mahadev
>> > >
>> > > On Fri, Oct 7, 2011 at 6:47 PM, Ishaaq Chandy <[EMAIL PROTECTED]>
>> wrote:
>> > > > Hi all,
>> > > >
>> > > > We're seeing a puzzling error. Here's the scenario:
>> > > >
>> > > > 1. We have a single thread that wakes up every two seconds (give or
>> > take)
>> > > > and does some work
>> > > > 2. As part of that work it updates a node on ZK. When it does this
>> it
>> > > first
>> > > > gets the Stat of the existing node and uses the version retrieved
>> from
>> > it
>> > > to
>> > > > update the value.
>> > > > 3. There are no other processes updating the node
>> > > >
>> > > > The code goes something like this:
>> > > >  final Stat stat = zooKeeper.exists(path, false);
>> > > > // do some other work here to create the path if it does not exist -
>> > this