Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # user >> leader election, scheduled tasks, losing leadership

Copy link to this message
leader election, scheduled tasks, losing leadership
Hi all:

In my system I have scheduled tasks that only one cluster member should
run.  I am using the leader election recipe to determine which cluster
member should run the scheduled tasks.

The way it works is that every cluster member has the scheduler running.
At the time a scheduled job starts all cluster members execute the same
method.  It first checks if the current node is the leader.  If it is it
goes ahead and runs the task.   Otherwise the method returns.

The tasks themselves can take a few milliseconds up to tens of minutes.
During the time the task is running a cluster member could lose its
leadership.   I don't want another cluster member to start running a
scheduled leader-only task until the first one is finished.

At first I considered using an ephemeral node as a flag to indicate "task
in progress" and changing the logic for starting a scheduled task to be "if
I am the leader AND no task is currently in progress".   However, if the
znode is ephemeral it could get lost the same way the leadership was lost.
  On the other hand if I use a non-ephemeral node I need to add logic to
check for stale/invalid "task in progress" nodes (check for staleness plus
try to contact the node that is running the task to see if it responds).

Am I correct in assuming that I cannot use an ephemeral node for the "task
in progress" flag?   And that a non-ephemeral node with stale checking is
the way to go?   This seems like a pretty common use case.


-- Eric