Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - znode metadata consistency


Copy link to this message
-
Re: znode metadata consistency
Vishal Kher 2011-03-01, 22:59
Hi Jeremy,

I just realized that you are using a standalone ZK server. I don't think the
bugs apply to you, so I don't have an answer to your question.
I think 3.3.3 should be released soon:
http://zookeeper-dev.578911.n2.nabble.com/VOTE-Release-ZooKeeper-3-3-3-candidate-1-td6059109.html

-Vishal

On Tue, Mar 1, 2011 at 4:15 PM, Jeremy Stribling <[EMAIL PROTECTED]> wrote:

> Thanks for the pointers Vishal, I hadn't seen those.  They look like they
> could be related, but without knowing how metadata updates are grouped into
> transactions, it's hard for me to say.  I would expect the cversion update
> to happen within the same transaction as the creation of a new child, but if
> they get written to the log in two separate steps, perhaps these issues
> could explain it.
>
> Any estimate on when 3.3.3 will be released?  I haven't seen any updates on
> the user list about it.  Thanks,
>
> Jeremy
>
>
> On 03/01/2011 12:40 PM, Vishal Kher wrote:
>
>> Hi Jermy,
>>
>> One of the main reasons for 3.3.3 release was to include fixes for znode
>> inconsistency bugs.
>> Have you taken a look at
>> https://issues.apache.org/jira/browse/ZOOKEEPER-962and
>> https://issues.apache.org/jira/browse/ZOOKEEPER-919?
>> The problem that you are seeing sounds similar to the ones reported.
>>
>> -Vishal
>>
>>
>>
>> On Mon, Feb 28, 2011 at 8:04 PM, Jeremy Stribling<[EMAIL PROTECTED]>
>>  wrote:
>>
>>
>>
>>> Hi all,
>>>
>>> A while back I noticed that my Zookeeper cluster got into a state where I
>>> would get a "node exists" error back when creating a sequential znode --
>>> see
>>> the thread starting at
>>>
>>> http://mail-archives.apache.org/mod_mbox/hadoop-zookeeper-user/201010.mbox/%[EMAIL PROTECTED]%3Eformore details.  The summary is that at the time, my application had a bug
>>>
>>> that could have been improperly bringing new nodes into a cluster.
>>>
>>> However, I've seen this a couple more times since fixing that original
>>> bug.
>>>  I don't yet know how to reproduce it, but I am going to keep trying.  In
>>> one case, we restarted a node (in a one-node cluster), and when it came
>>> back
>>> up we could no longer create sequential nodes on a certain parent node,
>>> with
>>> a node exists (-110) error code.  The biggest child it saw on restart was
>>> /zkrsm/000000000000002d_record0000120804 (i.e., a sequence number of
>>> 120804), however a stat on the parent node revealed that the cversion was
>>> only 120710:
>>>
>>> [zk:<ip:port>(CONNECTED) 3] stat /zkrsm
>>> cZxid = 0x5
>>> ctime = Mon Jan 17 18:28:19 PST 2011
>>> mZxid = 0x5
>>> mtime = Mon Jan 17 18:28:19 PST 2011
>>> pZxid = 0x1d819
>>> cversion = 120710
>>> dataVersion = 0
>>> aclVersion = 0
>>> ephemeralOwner = 0x0
>>> dataLength = 0
>>> numChildren = 2955
>>>
>>> So my question is: how is znode metadata persisted with respect to the
>>> actual znodes?  Is it possible that a node's children will get synced to
>>> disk before its own metadata, and if it crashes at a bad time, the
>>> metadata
>>> updates will be lost?  If so, is there any way to constrain Zookeeper so
>>> that it will sync its metadata before returning success for write
>>> operations?
>>>
>>> (I'm using Zookeeper 3.3.2 on a Debian Squeeze 64-bit box, with
>>> openjdk-6-jre 6b18-1.8.3-2.)
>>>
>>> I'd be happy to create a JIRA for this if that seems useful, but without
>>> a
>>> way to reproduce it I'm not sure that it is.
>>>
>>> Thanks,
>>>
>>> Jeremy
>>>
>>>
>>>
>>>
>>
>>
>