Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> Re: question about ZK robustness


Copy link to this message
-
Re: question about ZK robustness
Agreed with Chang on all fronts. I will repro the problem and upload logs.

2010/12/1 Chang Song <[EMAIL PROTECTED]>

>
> I think it is not too difficult to reproduce.
> Just create 3 node ensemble, and have some clients create ephemeral nodes.
> And then kill one of ensemble by kill -9.
> I don't remember it was a leader or a follower.
>
> and then if you see those ephemeral nodes gone, restart the ensemble Java
> process.
>
> I think I have seen this happening twice when I continued this same
> experiment multiple times.
>
> I am not trying to create FUD around Zookeeper. Actually it is exact
> opposite.
> I fell in love with Zookeeper, and I still am.  I am using Zookeeper for
> our production system.
> In fact, it is THE only Java solution I believe in. Really.
>
> I just couldn't find time to reproduce and report a bug.
>
> Chang
>
>
> Dec 1, 2010, 11:08 PM, Fournier, Camille F. [Tech] 작성:
>
> > Would love to hear more about your ensemble settings to try and recreate
> this issue. Would be a very bad thing for my deployment as well...
> >
> > Camille
> >
> > ----- Original Message -----
> > From: Chang Song <[EMAIL PROTECTED]>
> > To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> > Cc: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> > Sent: Wed Dec 01 08:09:30 2010
> > Subject: Re: question about ZK robustness
> >
> >
> > Ted.
> >
> > I have been inconsistency between different ensemble servers when we did
> > some torture testing.
> >
> > I killed Java process with -9 on one ensemble server, and restarted it,
> and saw
> > that ephemeral nodes that disappeared from other two ensemble servers
> stuck in
> > newly restarted ensemble. No matter what I do, "create, sync, get", the
> ephemeral
> > nodes did not disappear.  I had to remove the log and force re-sync from
> scratch.
> >
> > I had seen this behavior twice. Exactly the same behavior. I had about
> 2000 clients connected
> > ensemble servers. I had no time to file a bug report, but when I have
> time to do another
> > torture testing, I will definitely file a bug report.
> >
> > This is not a data loss, but a serious, dead serious inconsistency as far
> as my application goes.
> > Please let me know if you happened to know related bug.
> >
> > Thank you.
> >
> > Chang
> >
> >
> > Dec 1, 2010, 1:41 PM, Ted Dunning 작성:
> >
> >> Sure.  Let me know when.  I have learned a bit more from Ben since I
> wrote
> >> that first bit so I could amplify the exposition
> >> just a bit when the time comes.
> >>
> >> On Tue, Nov 30, 2010 at 8:07 PM, Mahadev Konar <[EMAIL PROTECTED]
> >wrote:
> >>
> >>> I meant to say, we can wait a while before we are done moving to the
> new
> >>> wiki tree.
> >>>
> >
>
>