Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Node being there and not at the same time

Copy link to this message
Re: Node being there and not at the same time
This sounds like a good idea. I'm not sure how easy it would be to
implement as the client may need to be in a new sort of "conditional" state.


On Thu, Aug 30, 2012 at 10:50 PM, Bill Bridge <[EMAIL PROTECTED]>wrote:

>  Nothing to be sorry about, I was wrong to suggest a client could see an
> old state by reconnecting. When you said that it should not be allowed I
> realized that had to be the case. I saw that email too and realized it had
> something to do with this subject.
> It would seem nicer to simply do a sync() when this happens rather than
> refusing the connection. We could destroy the connection if the client is
> still in the future after a sync(). There is something seriously wrong if
> the client is still in the future after a sync(). If this happened with the
> current code the client would just keep trying until the connection finally
> worked and we would not find out that something is wrong. I suppose the
> client's last zxid could have been corrupted in his memory causing this
> problem. It would be good to have this disconnect and fail the client
> rather than spin.
> Without the connection you cannot do the sync() yourself. It is
> conceivable that it will be a few seconds before there is another server
> that is current enough to connect with. Maybe the other servers are in
> different data centers and would not be efficient to connect to them.
> Bill
> On 8/30/2012 10:21 PM, Alexander Shraer wrote:
> Bill,
>  I'm sorry - you were right and I totally quoted the wrong place in the
> code. The code that ensures that a client doesn't "go back in time" by
> connecting to a server that is less up to date than that client is most
> probably this one from ZooKeeperServer.java. I realized it after looking on
> the question of Simon today in the mailing list...
>       if (connReq.getLastZxidSeen() > zkDb.dataTree.lastProcessedZxid)
>             String msg = "Refusing session request for client "
>                 + cnxn.getRemoteSocketAddress()
>                 + " as it has seen zxid 0x"
>                 + Long.toHexString(connReq.getLastZxidSeen())
>                 + " our last zxid is 0x"
>                 +
> Long.toHexString(getZKDatabase().getDataTreeLastProcessedZxid())
>                 + " client must try another server";
> On Mon, Aug 27, 2012 at 10:22 AM, Bill Bridge <[EMAIL PROTECTED]>wrote:
>> Alex,
>> You certainly know the code much better than I, so I may be mistaken
>> here. It looks to me like waitForEpochAck() is about changes in the set of
>> peers, and is not related to client connect/disconnects. I do not see how
>> this would be called if a client disconnected due to some problem of his
>> own, such as too slow to heartbeat, then reconnected to a different peer or
>> observer.
>> You suggest that a reconnecting client should ensure the new server has
>> seen all transactions that the client has seen. This sounds like the right
>> thing to do. This would certainly eliminate the race condition I
>> postulated. This sounds like the kind of thing someone would have already
>> thought of. If this is not already done then it would be a good change to
>> make. I do not know where the code to do that would be. It could be part of
>> the server reconnect code or it could be a sync() in the client library.
>> If Mattias's code creates a new session when reconnecting, rather than
>> reconnecting to the same session, then he could have the problem described
>> even if reconnect ensures the client is not ahead of the server. He could
>> fix this either by reconnecting to the same session, or simply doing a
>> sync() when necessary.
>> Thanks,
>> Bill
>> On 8/24/2012 6:11 PM, Alexander Shraer wrote:
>>> Bill,  if I understand correctly this shouldn't be possible - the
>>> client will not be able to connect to a server that is
>>> less up-to-date than that same client. So if the create completed at
>>> the client before it disconnects the new server will have to know