Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper, mail # user - leader election, scheduled tasks, losing leadership


+
Eric Pederson 2012-12-09, 04:17
+
Jordan Zimmerman 2012-12-09, 04:25
+
Eric Pederson 2012-12-09, 04:49
+
Jordan Zimmerman 2012-12-09, 04:52
+
Eric Pederson 2012-12-09, 04:54
+
Jordan Zimmerman 2012-12-09, 04:57
+
Eric Pederson 2012-12-09, 04:56
+
Jordan Zimmerman 2012-12-09, 05:00
+
Henry Robinson 2012-12-09, 05:02
+
Jordan Zimmerman 2012-12-09, 05:04
+
Henry Robinson 2012-12-09, 05:12
+
Jordan Zimmerman 2012-12-09, 05:18
+
Henry Robinson 2012-12-09, 05:30
+
Jordan Zimmerman 2012-12-09, 05:41
+
Eric Pederson 2012-12-09, 21:42
+
Eric Pederson 2012-12-09, 22:10
+
Vitalii Tymchyshyn 2012-12-10, 06:49
Copy link to this message
-
Re: leader election, scheduled tasks, losing leadership
Eric Pederson 2012-12-10, 11:52
It depends on the scheduled task.  Some have status fields in the database
that indicate new/in-progress/done, but others do not.
-- Eric

On Mon, Dec 10, 2012 at 1:49 AM, Vitalii Tymchyshyn <[EMAIL PROTECTED]>wrote:

> How are you going to ensure atomicity? I mean, if you processor dies in the
> middle of the operation, how do you know if it is done or not?
>
> --
> Best regards,
> Vitalii Tymchyshyn
> 10 груд. 2012 00:11, "Eric Pederson" <[EMAIL PROTECTED]> напис.
>
> > Also sometimes the app leadership (via LeaderLatch) will get lost - I
> will
> > follow up about this on the Curator list:
> > https://gist.github.com/4247226
> >
> > So back to my previous question, what is the best way to implement the
> > "fence"?
> >
> > -- Eric
> >
> >
> >
> > On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <[EMAIL PROTECTED]> wrote:
> >
> > > The irony is that I am using leader election to convert non-idempotent
> > > operations into idempotent operations :)   For example, once a night a
> > > report is emailed out to a set of addresses.   We don't want the report
> > to
> > > go to the same person more than once.
> > >
> > > Prior to using leader election one of the cluster members was
> designated
> > > as the scheduled task "leader" using a system property.  But if that
> > > cluster member crashed it required a manual operation to failover the
> > > "leader" responsibility to another cluster member.   I considered using
> > > app-specific techniques to make the scheduled tasks idempotent (for
> > example
> > > using "select for update" / database locking) but I wanted a general
> > > solution and I needed clustering support for other reasons (cluster
> > > membership, etc).
> > >
> > > Anyway, here is the code that I'm using.
> > >
> > > Application startup (using Curator LeaderLatch):
> > > https://gist.github.com/3936162
> > > https://gist.github.com/3935895
> > > https://gist.github.com/3935889
> > >
> > > ClusterStatus:
> > > https://gist.github.com/3943149
> > > https://gist.github.com/3935861
> > >
> > > Scheduled task:
> > > https://gist.github.com/4246388
> > >
> > > In the last gist the "distribute" scheduled task is run every 30
> seconds.
> > >   It checks clusterStatus.isLeader to see if the current cluster member
> > is
> > > the leader before running the real method (which sends email).
> > > clusterStatus() calls methods on LeaderLatch.
> > >
> > > Here is the output that I am seeing if I kill the ZK quorum leader and
> > the
> > > app cluster member that was the leader loses its LeaderLatch leadership
> > to
> > > another cluster member:
> > > https://gist.github.com/4247058
> > >
> > >
> > > -- Eric
> > >
> > >
> > >
> > > On Sun, Dec 9, 2012 at 12:30 AM, Henry Robinson <[EMAIL PROTECTED]
> > >wrote:
> > >
> > >> On 8 December 2012 21:18, Jordan Zimmerman <
> [EMAIL PROTECTED]
> > >> >wrote:
> > >>
> > >> > If your ConnectionStateListener gets SUSPENDED or LOST you've lost
> > >> > connection to ZooKeeper. Therefore you cannot use that same
> ZooKeeper
> > >> > connection to manage a node that denotes the process is running or
> > not.
> > >> > Only 1 VM at a time will be running the process. That process can
> > watch
> > >> for
> > >> > SUSPENDED/LOST and wind down the task.
> > >> >
> > >> >
> > >> My point is that by the time that VM sees SUSPENDED/LOST, another VM
> may
> > >> have been elected leader and have started running another process.
> > >>
> > >> It's a classic problem - you need some mechanism to fence a node that
> > >> thinks its the leader, but isn't and hasn't got the memo yet. The way
> > >> around the problem is to either ensure that no work is done by you
> once
> > >> you
> > >> are no longer the leader (perhaps by checking every time you want to
> do
> > >> work), or that the work you do does not affect the system (e.g. by
> > >> idempotent work units).
> > >>
> > >> ZK itself solves this internally by checking with that it has a quorum
> > for
> > >> every operation, which forces an ordering between the disconnection
+
Vitalii Tymchyshyn 2012-12-11, 20:09
+
Eric Pederson 2012-12-12, 00:54
+
Henry Robinson 2012-12-09, 04:59
+
Eric Pederson 2012-12-09, 05:00