Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> leader election, scheduled tasks, losing leadership


Copy link to this message
-
Re: leader election, scheduled tasks, losing leadership
How are you going to ensure atomicity? I mean, if you processor dies in the
middle of the operation, how do you know if it is done or not?

--
Best regards,
Vitalii Tymchyshyn
10 груд. 2012 00:11, "Eric Pederson" <[EMAIL PROTECTED]> напис.

> Also sometimes the app leadership (via LeaderLatch) will get lost - I will
> follow up about this on the Curator list:
> https://gist.github.com/4247226
>
> So back to my previous question, what is the best way to implement the
> "fence"?
>
> -- Eric
>
>
>
> On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <[EMAIL PROTECTED]> wrote:
>
> > The irony is that I am using leader election to convert non-idempotent
> > operations into idempotent operations :)   For example, once a night a
> > report is emailed out to a set of addresses.   We don't want the report
> to
> > go to the same person more than once.
> >
> > Prior to using leader election one of the cluster members was designated
> > as the scheduled task "leader" using a system property.  But if that
> > cluster member crashed it required a manual operation to failover the
> > "leader" responsibility to another cluster member.   I considered using
> > app-specific techniques to make the scheduled tasks idempotent (for
> example
> > using "select for update" / database locking) but I wanted a general
> > solution and I needed clustering support for other reasons (cluster
> > membership, etc).
> >
> > Anyway, here is the code that I'm using.
> >
> > Application startup (using Curator LeaderLatch):
> > https://gist.github.com/3936162
> > https://gist.github.com/3935895
> > https://gist.github.com/3935889
> >
> > ClusterStatus:
> > https://gist.github.com/3943149
> > https://gist.github.com/3935861
> >
> > Scheduled task:
> > https://gist.github.com/4246388
> >
> > In the last gist the "distribute" scheduled task is run every 30 seconds.
> >   It checks clusterStatus.isLeader to see if the current cluster member
> is
> > the leader before running the real method (which sends email).
> > clusterStatus() calls methods on LeaderLatch.
> >
> > Here is the output that I am seeing if I kill the ZK quorum leader and
> the
> > app cluster member that was the leader loses its LeaderLatch leadership
> to
> > another cluster member:
> > https://gist.github.com/4247058
> >
> >
> > -- Eric
> >
> >
> >
> > On Sun, Dec 9, 2012 at 12:30 AM, Henry Robinson <[EMAIL PROTECTED]
> >wrote:
> >
> >> On 8 December 2012 21:18, Jordan Zimmerman <[EMAIL PROTECTED]
> >> >wrote:
> >>
> >> > If your ConnectionStateListener gets SUSPENDED or LOST you've lost
> >> > connection to ZooKeeper. Therefore you cannot use that same ZooKeeper
> >> > connection to manage a node that denotes the process is running or
> not.
> >> > Only 1 VM at a time will be running the process. That process can
> watch
> >> for
> >> > SUSPENDED/LOST and wind down the task.
> >> >
> >> >
> >> My point is that by the time that VM sees SUSPENDED/LOST, another VM may
> >> have been elected leader and have started running another process.
> >>
> >> It's a classic problem - you need some mechanism to fence a node that
> >> thinks its the leader, but isn't and hasn't got the memo yet. The way
> >> around the problem is to either ensure that no work is done by you once
> >> you
> >> are no longer the leader (perhaps by checking every time you want to do
> >> work), or that the work you do does not affect the system (e.g. by
> >> idempotent work units).
> >>
> >> ZK itself solves this internally by checking with that it has a quorum
> for
> >> every operation, which forces an ordering between the disconnection
> event
> >> and trying to do something that relies upon being the leader. Other
> >> systems
> >> forcibly terminate old leaders before allowing a new leader to take the
> >> throne.
> >>
> >> Henry
> >>
> >>
> >> > > You can't assume that the notification is received locally before
> >> another
> >> > > leader election finishes elsewhere
> >> > Which notification? The ConnectionStateListener is an abstraction on