Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - devops/admin/client question: What do you do when you rollback?


Copy link to this message
-
Re: devops/admin/client question: What do you do when you rollback?
Vishal Kher 2011-08-07, 19:01
Hi Camille,

Can you share the kind of problems you were facing on the servers that
forced you to rollback the cluster?

Thanks.
-Vishal

On Thu, Aug 4, 2011 at 1:29 PM, Fournier, Camille F. <
[EMAIL PROTECTED]> wrote:

> We had an issue here the other day where the ZK servers were running
> poorly, and in an effort to get them healthy again we ended up rolling back
> the cluster state. While this was, in retrospect, not the right solution to
> the problem we were facing, it brought up another problem. Namely, that many
> of our clients couldn't reconnect with their sessions because their zxid was
> too high (expected), but that the error they got when trying to do that
> reconnection was just a vanilla disconnected error. The result was that most
> of our clients had to be bounced.
>
> Aside from trying hard to avoid ever rolling back the cluster state, does
> anyone have a way they deal with this situation if it occurs? Should we
> consider enhancing the error message to the client so we could track the
> fact that we were ahead of the quorum zxid and react sensibly? Alternately,
> since we were sending a sessionId along with the zxid, perhaps it would be
> nice to check to see if the sessionId exists before checking the zxid, which
> would send an expired state signal which my client code could handle
> cleanly.
>
> Any ideas or suggestions would be welcome.
>
> C
>
>