Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Fwd: Dealing with data inconsistency


Copy link to this message
-
Fwd: Dealing with data inconsistency
Jordan and I were talking about how the recipes work correctly despite the possibility of a client seeing stale data due to data inconsistency. Sharing this for the benefit of the community.
Begin forwarded message:

> From: Jordan Zimmerman <[EMAIL PROTECTED]>
> Date: November 24, 2012, 6:57:55 PM PST
> To: Narayanan Arunachalam <[EMAIL PROTECTED]>
> Subject: Re: Dealing with data inconsistency
> That's right. Most of the recipes are "backwards looking". They write a sequential node, then get children to see where their node sorts to.
>
> -JZ
>
> On Nov 24, 2012, at 6:56 PM, Narayanan Arunachalam <[EMAIL PROTECTED]> wrote:
>
>> So I believe all the recipes leverage this pattern and not get affected by stale data.
>>
>> On Nov 24, 2012, at 4:12 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote:
>>
>>> c1 does not need to see lock_01. It doesn't affect who gets the lock. c1 is the locker.
>>>
>>> -JZ
>>>
>>> On Nov 24, 2012, at 4:07 PM, Narayanan Arunachalam <[EMAIL PROTECTED]> wrote:
>>>
>>>> Client c1 creates a lock node my_lock/lock_00 at time t1 and does getChildren, time t2. During the time interval between t1and t2, client c2 created a lock node my_lock/lock_01. There is possibility that client c1 won't see lock_01 node.
>>>>
>>>> Will send our conversation to the community as well.
>>>>
>>>> On Nov 24, 2012, at 2:40 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote:
>>>>
>>>>> These questions might be better for the zookeeper-user list. But, I'll give it a shot: In ZooKeeper, writes are always through the leader. Writes are also always ordered. So, as part of the write process, the server that the client is connected to does a sync with the leader. Therefore, any writes that precede the write being processed are visible to the client post-write. Reads, however, are processed only by the server the client is connected to. Thus, you get a potentially stale view of the DB from a read.
>>>>>
>>>>> -JZ
>>>>>
>>>>> On Nov 24, 2012, at 2:36 PM, Narayanan Arunachalam <[EMAIL PROTECTED]> wrote:
>>>>>
>>>>>> I might be missing something to understand this. Can you describe a simple scenario where a client can get stale data?
>>>>>>
>>>>>> My misunderstanding comes from the thought that a write, always precedes a read, by some client, then the database is always in sync?
>>>>>>
>>>>>> On Nov 24, 2012, at 1:53 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote:
>>>>>>
>>>>>>> getData, getChildren and exists can return stale data. However, whenever you do a write operation, the database is synced. Therefore, if you do a write and then call getChildren you are guaranteed to get all the nodes that have been created before you. Thus, if your node is the first node you are guaranteed that you are the lock holder.
>>>>>>>
>>>>>>> -JZ
>>>>>>>
>>>>>>> On Nov 24, 2012, at 1:22 PM, Narayanan Arunachalam <[EMAIL PROTECTED]> wrote:
>>>>>>>
>>>>>>>> So you mean only getData can return inconsistent values. A create node is always synced.
>>>>>>>>
>>>>>>>> On Nov 24, 2012, at 1:12 PM, Jordan Zimmerman <[EMAIL PROTECTED]> wrote:
>>>>>>>>
>>>>>>>>> Only reads are potentially inconsistent. All write operations imply a sync. In the lock recipe, the getChildren call is used only to look for previously created nodes. Because the locker has created a new node, there's an implied sync. Thus, the getChildren() will be accurate for the previously created nodes.
>>>>>>>>>
>>>>>>>>> -JZ
>>>>>>>>>
>>>>>>>>> On Nov 24, 2012, at 12:44 PM, Narayanan Arunachalam <[EMAIL PROTECTED]> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jordan,
>>>>>>>>>>
>>>>>>>>>> The update requests to ZK is considered completed if the majority of the nodes agree. A client could get stale data if it made a request to a server that didn't get the update yet. How is this taken care in the lock recipe so it works correctly. For example see the steps I took from ZK doc.
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB