Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - Re: question about ZK robustness


Copy link to this message
-
Re: question about ZK robustness
Vishal Kher 2010-12-07, 09:23
See https://issues.apache.org/jira/browse/ZOOKEEPER-919

2010/12/2 Benjamin Reed <[EMAIL PROTECTED]>

> Chang, this is indeed a serious bug. it would be great if we could
> reproduce it reliably. could you confirm the version of code you are
> using. could you include enough detail that we could try to reproduce it
> on our cluster?
>
> thanx
> ben
>
> On 12/01/2010 07:05 AM, Vishal Kher wrote:
> > Agreed with Chang on all fronts. I will repro the problem and upload
> logs.
> >
> > 2010/12/1 Chang Song <[EMAIL PROTECTED]>
> >
> >> I think it is not too difficult to reproduce.
> >> Just create 3 node ensemble, and have some clients create ephemeral
> nodes.
> >> And then kill one of ensemble by kill -9.
> >> I don't remember it was a leader or a follower.
> >>
> >> and then if you see those ephemeral nodes gone, restart the ensemble
> Java
> >> process.
> >>
> >> I think I have seen this happening twice when I continued this same
> >> experiment multiple times.
> >>
> >> I am not trying to create FUD around Zookeeper. Actually it is exact
> >> opposite.
> >> I fell in love with Zookeeper, and I still am.  I am using Zookeeper for
> >> our production system.
> >> In fact, it is THE only Java solution I believe in. Really.
> >>
> >> I just couldn't find time to reproduce and report a bug.
> >>
> >> Chang
> >>
> >>
> >> Dec 1, 2010, 11:08 PM, Fournier, Camille F. [Tech] 작성:
> >>
> >>> Would love to hear more about your ensemble settings to try and
> recreate
> >> this issue. Would be a very bad thing for my deployment as well...
> >>> Camille
> >>>
> >>> ----- Original Message -----
> >>> From: Chang Song <[EMAIL PROTECTED]>
> >>> To: [EMAIL PROTECTED] <[EMAIL PROTECTED]>
> >>> Cc: [EMAIL PROTECTED] <[EMAIL PROTECTED]
> >
> >>> Sent: Wed Dec 01 08:09:30 2010
> >>> Subject: Re: question about ZK robustness
> >>>
> >>>
> >>> Ted.
> >>>
> >>> I have been inconsistency between different ensemble servers when we
> did
> >>> some torture testing.
> >>>
> >>> I killed Java process with -9 on one ensemble server, and restarted it,
> >> and saw
> >>> that ephemeral nodes that disappeared from other two ensemble servers
> >> stuck in
> >>> newly restarted ensemble. No matter what I do, "create, sync, get", the
> >> ephemeral
> >>> nodes did not disappear.  I had to remove the log and force re-sync
> from
> >> scratch.
> >>> I had seen this behavior twice. Exactly the same behavior. I had about
> >> 2000 clients connected
> >>> ensemble servers. I had no time to file a bug report, but when I have
> >> time to do another
> >>> torture testing, I will definitely file a bug report.
> >>>
> >>> This is not a data loss, but a serious, dead serious inconsistency as
> far
> >> as my application goes.
> >>> Please let me know if you happened to know related bug.
> >>>
> >>> Thank you.
> >>>
> >>> Chang
> >>>
> >>>
> >>> Dec 1, 2010, 1:41 PM, Ted Dunning 작성:
> >>>
> >>>> Sure.  Let me know when.  I have learned a bit more from Ben since I
> >> wrote
> >>>> that first bit so I could amplify the exposition
> >>>> just a bit when the time comes.
> >>>>
> >>>> On Tue, Nov 30, 2010 at 8:07 PM, Mahadev Konar <[EMAIL PROTECTED]
> >>> wrote:
> >>>>> I meant to say, we can wait a while before we are done moving to the
> >> new
> >>>>> wiki tree.
> >>>>>
> >>
>
>