Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> leader election, scheduled tasks, losing leadership


+
Eric Pederson 2012-12-09, 04:17
+
Jordan Zimmerman 2012-12-09, 04:25
+
Eric Pederson 2012-12-09, 04:49
+
Jordan Zimmerman 2012-12-09, 04:52
+
Eric Pederson 2012-12-09, 04:54
+
Jordan Zimmerman 2012-12-09, 04:57
+
Eric Pederson 2012-12-09, 04:56
+
Jordan Zimmerman 2012-12-09, 05:00
+
Henry Robinson 2012-12-09, 05:02
+
Jordan Zimmerman 2012-12-09, 05:04
+
Henry Robinson 2012-12-09, 05:12
+
Jordan Zimmerman 2012-12-09, 05:18
+
Henry Robinson 2012-12-09, 05:30
+
Jordan Zimmerman 2012-12-09, 05:41
+
Eric Pederson 2012-12-09, 21:42
+
Eric Pederson 2012-12-09, 22:10
+
Vitalii Tymchyshyn 2012-12-10, 06:49
+
Eric Pederson 2012-12-10, 11:52
+
Vitalii Tymchyshyn 2012-12-11, 20:09
Copy link to this message
-
Re: leader election, scheduled tasks, losing leadership
Thanks Vitalii!  I will think about this and ask if I have any questions.
-- Eric

On Tue, Dec 11, 2012 at 3:09 PM, Vitalii Tymchyshyn <[EMAIL PROTECTED]>wrote:

> I am asking because you have this "at most once" vs "at least one" problem.
> I don't think you can have "exactly one" unless your jobs are transactional
> and you can synhronize your transaction commits to zookeeper (and better
> with two-phase commit). So, you need to decide
>
> What I'd recommend  to you is to make queue-like architecture, not
> lock-based. This way you can:
> a) Do parallel task processing
> b) Try increasing timeouts to be larger than maximum task time.
>     E.g. set it to one hour. This would mean that task running will restart
> in an hour if client fails.
>
> But this would mean moving from database to zookeeper for task
> status/queueing. As for me this would be good as database is SPOF for you.
>
> Best regards, Vitalii Tymchyshyn
>
>
> 2012/12/10 Eric Pederson <[EMAIL PROTECTED]>
>
> > It depends on the scheduled task.  Some have status fields in the
> database
> > that indicate new/in-progress/done, but others do not.
> >
> >
> > -- Eric
> >
> >
> >
> > On Mon, Dec 10, 2012 at 1:49 AM, Vitalii Tymchyshyn <[EMAIL PROTECTED]
> > >wrote:
> >
> > > How are you going to ensure atomicity? I mean, if you processor dies in
> > the
> > > middle of the operation, how do you know if it is done or not?
> > >
> > > --
> > > Best regards,
> > > Vitalii Tymchyshyn
> > > 10 груд. 2012 00:11, "Eric Pederson" <[EMAIL PROTECTED]> напис.
> > >
> > > > Also sometimes the app leadership (via LeaderLatch) will get lost - I
> > > will
> > > > follow up about this on the Curator list:
> > > > https://gist.github.com/4247226
> > > >
> > > > So back to my previous question, what is the best way to implement
> the
> > > > "fence"?
> > > >
> > > > -- Eric
> > > >
> > > >
> > > >
> > > > On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > The irony is that I am using leader election to convert
> > non-idempotent
> > > > > operations into idempotent operations :)   For example, once a
> night
> > a
> > > > > report is emailed out to a set of addresses.   We don't want the
> > report
> > > > to
> > > > > go to the same person more than once.
> > > > >
> > > > > Prior to using leader election one of the cluster members was
> > > designated
> > > > > as the scheduled task "leader" using a system property.  But if
> that
> > > > > cluster member crashed it required a manual operation to failover
> the
> > > > > "leader" responsibility to another cluster member.   I considered
> > using
> > > > > app-specific techniques to make the scheduled tasks idempotent (for
> > > > example
> > > > > using "select for update" / database locking) but I wanted a
> general
> > > > > solution and I needed clustering support for other reasons (cluster
> > > > > membership, etc).
> > > > >
> > > > > Anyway, here is the code that I'm using.
> > > > >
> > > > > Application startup (using Curator LeaderLatch):
> > > > > https://gist.github.com/3936162
> > > > > https://gist.github.com/3935895
> > > > > https://gist.github.com/3935889
> > > > >
> > > > > ClusterStatus:
> > > > > https://gist.github.com/3943149
> > > > > https://gist.github.com/3935861
> > > > >
> > > > > Scheduled task:
> > > > > https://gist.github.com/4246388
> > > > >
> > > > > In the last gist the "distribute" scheduled task is run every 30
> > > seconds.
> > > > >   It checks clusterStatus.isLeader to see if the current cluster
> > member
> > > > is
> > > > > the leader before running the real method (which sends email).
> > > > > clusterStatus() calls methods on LeaderLatch.
> > > > >
> > > > > Here is the output that I am seeing if I kill the ZK quorum leader
> > and
> > > > the
> > > > > app cluster member that was the leader loses its LeaderLatch
> > leadership
> > > > to
> > > > > another cluster member:
> > > > > https://gist.github.com/4247058
> > > > >
> > > > >
> > > > > -- Eric
+
Henry Robinson 2012-12-09, 04:59
+
Eric Pederson 2012-12-09, 05:00