Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> leader election, scheduled tasks, losing leadership


Copy link to this message
-
Re: leader election, scheduled tasks, losing leadership
Thanks Vitalii!  I will think about this and ask if I have any questions.
-- Eric

On Tue, Dec 11, 2012 at 3:09 PM, Vitalii Tymchyshyn <[EMAIL PROTECTED]>wrote:

> I am asking because you have this "at most once" vs "at least one" problem.
> I don't think you can have "exactly one" unless your jobs are transactional
> and you can synhronize your transaction commits to zookeeper (and better
> with two-phase commit). So, you need to decide
>
> What I'd recommend  to you is to make queue-like architecture, not
> lock-based. This way you can:
> a) Do parallel task processing
> b) Try increasing timeouts to be larger than maximum task time.
>     E.g. set it to one hour. This would mean that task running will restart
> in an hour if client fails.
>
> But this would mean moving from database to zookeeper for task
> status/queueing. As for me this would be good as database is SPOF for you.
>
> Best regards, Vitalii Tymchyshyn
>
>
> 2012/12/10 Eric Pederson <[EMAIL PROTECTED]>
>
> > It depends on the scheduled task.  Some have status fields in the
> database
> > that indicate new/in-progress/done, but others do not.
> >
> >
> > -- Eric
> >
> >
> >
> > On Mon, Dec 10, 2012 at 1:49 AM, Vitalii Tymchyshyn <[EMAIL PROTECTED]
> > >wrote:
> >
> > > How are you going to ensure atomicity? I mean, if you processor dies in
> > the
> > > middle of the operation, how do you know if it is done or not?
> > >
> > > --
> > > Best regards,
> > > Vitalii Tymchyshyn
> > > 10 груд. 2012 00:11, "Eric Pederson" <[EMAIL PROTECTED]> напис.
> > >
> > > > Also sometimes the app leadership (via LeaderLatch) will get lost - I
> > > will
> > > > follow up about this on the Curator list:
> > > > https://gist.github.com/4247226
> > > >
> > > > So back to my previous question, what is the best way to implement
> the
> > > > "fence"?
> > > >
> > > > -- Eric
> > > >
> > > >
> > > >
> > > > On Sun, Dec 9, 2012 at 4:42 PM, Eric Pederson <[EMAIL PROTECTED]>
> > wrote:
> > > >
> > > > > The irony is that I am using leader election to convert
> > non-idempotent
> > > > > operations into idempotent operations :)   For example, once a
> night
> > a
> > > > > report is emailed out to a set of addresses.   We don't want the
> > report
> > > > to
> > > > > go to the same person more than once.
> > > > >
> > > > > Prior to using leader election one of the cluster members was
> > > designated
> > > > > as the scheduled task "leader" using a system property.  But if
> that
> > > > > cluster member crashed it required a manual operation to failover
> the
> > > > > "leader" responsibility to another cluster member.   I considered
> > using
> > > > > app-specific techniques to make the scheduled tasks idempotent (for
> > > > example
> > > > > using "select for update" / database locking) but I wanted a
> general
> > > > > solution and I needed clustering support for other reasons (cluster
> > > > > membership, etc).
> > > > >
> > > > > Anyway, here is the code that I'm using.
> > > > >
> > > > > Application startup (using Curator LeaderLatch):
> > > > > https://gist.github.com/3936162
> > > > > https://gist.github.com/3935895
> > > > > https://gist.github.com/3935889
> > > > >
> > > > > ClusterStatus:
> > > > > https://gist.github.com/3943149
> > > > > https://gist.github.com/3935861
> > > > >
> > > > > Scheduled task:
> > > > > https://gist.github.com/4246388
> > > > >
> > > > > In the last gist the "distribute" scheduled task is run every 30
> > > seconds.
> > > > >   It checks clusterStatus.isLeader to see if the current cluster
> > member
> > > > is
> > > > > the leader before running the real method (which sends email).
> > > > > clusterStatus() calls methods on LeaderLatch.
> > > > >
> > > > > Here is the output that I am seeing if I kill the ZK quorum leader
> > and
> > > > the
> > > > > app cluster member that was the leader loses its LeaderLatch
> > leadership
> > > > to
> > > > > another cluster member:
> > > > > https://gist.github.com/4247058
> > > > >
> > > > >
> > > > > -- Eric
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB