Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - Node being there and not at the same time


+
Mattias Persson 2012-08-23, 10:30
+
David Nickerson 2012-08-23, 14:53
+
Mattias Persson 2012-08-23, 15:21
+
Bill Bridge 2012-08-25, 00:15
+
Alexander Shraer 2012-08-25, 01:11
+
Bill Bridge 2012-08-27, 17:22
Copy link to this message
-
Re: Node being there and not at the same time
Alexander Shraer 2012-08-27, 17:40
Hi Bill,

agreed - if the client's session expires than this is possible.
Although I don't believe that this is what's happening here since
peers usually catch up on commits really quickly while session
expiration does take some time, so its unlikely that after expiration
the client reconnects and there is a peer that is still less
up-to-date. More likely that he's creating a new client handle or some
other issue as Camille suggests.

Thanks,
Alex

On Mon, Aug 27, 2012 at 10:22 AM, Bill Bridge <[EMAIL PROTECTED]> wrote:
> Alex,
> You certainly know the code much better than I, so I may be mistaken here.
> It looks to me like waitForEpochAck() is about changes in the set of peers,
> and is not related to client connect/disconnects. I do not see how this
> would be called if a client disconnected due to some problem of his own,
> such as too slow to heartbeat, then reconnected to a different peer or
> observer.
>
> You suggest that a reconnecting client should ensure the new server has seen
> all transactions that the client has seen. This sounds like the right thing
> to do. This would certainly eliminate the race condition I postulated. This
> sounds like the kind of thing someone would have already thought of. If this
> is not already done then it would be a good change to make. I do not know
> where the code to do that would be. It could be part of the server reconnect
> code or it could be a sync() in the client library.
>
> If Mattias's code creates a new session when reconnecting, rather than
> reconnecting to the same session, then he could have the problem described
> even if reconnect ensures the client is not ahead of the server. He could
> fix this either by reconnecting to the same session, or simply doing a
> sync() when necessary.
>
> Thanks,
> Bill
>
>
> On 8/24/2012 6:11 PM, Alexander Shraer wrote:
>>
>> Bill,  if I understand correctly this shouldn't be possible - the
>> client will not be able to connect to a server that is
>> less up-to-date than that same client. So if the create completed at
>> the client before it disconnects the new server will have to know
>> about it too otherwise the connection will fail. See
>> Leader.waitForEpochAck:
>>
>> if (ss.isMoreRecentThan(leaderStateSummary)) {
>>                      throw new IOException("Follower is ahead of the
>> leader, leader summary: "
>>                                                      +
>> leaderStateSummary.getCurrentEpoch()
>>                                                      + " (current epoch),
>> "
>>                                                      +
>> leaderStateSummary.getLastZxid()
>>                                                      + " (last zxid)");
>>                  }
>>
>> of course its possible that another client connected to a different
>> server doesn't see the create.
>>
>> Alex
>>
>>
>> On Fri, Aug 24, 2012 at 5:15 PM, Bill Bridge <[EMAIL PROTECTED]>
>> wrote:
>>>
>>> Mattias,
>>>
>>> Is it possible that after you get NODEEXISTS from creation and before you
>>> do
>>> the second getData(), you reconnect to another ZooKeeper instance? If so,
>>> maybe the new connection is to a follower that has not yet seen the
>>> creation. If this is what is happening, then a sync() after the second
>>> NONODE with a third getData() should work. By only doing the sync() when
>>> you
>>> hit the unusual race condition it will have no performance impact.
>>>
>>> Bill
>>>
>>>
>>> On 8/23/2012 8:21 AM, Mattias Persson wrote:
>>>>
>>>> Hi David,
>>>>
>>>> There is nowhere in the code where that node gets deleted. If we refrain
>>>> from that suspicion, could there be something else?
>>>>
>>>> 2012/8/23 David Nickerson <[EMAIL PROTECTED]>
>>>>
>>>>> It's a little difficult to guess what your application is doing, but it
>>>>> sounds like there's "someone else" who can create and delete the nodes
>>>>> you're trying to work with. So when you create the node and check its
>>>>> data,
>>>>> someone else might have deleted it before you got the chance to check
+
Alexander Shraer 2012-08-31, 05:21
+
Bill Bridge 2012-08-31, 05:50
+
Alexander Shraer 2012-08-31, 06:04
+
Mattias Persson 2012-08-31, 07:00
+
Camille Fournier 2012-08-25, 01:17