Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> question on lock recipe


+
Will Johnson 2011-07-20, 21:03
+
Yang 2011-07-20, 21:07
+
Ted Dunning 2011-07-20, 21:19
+
Scott Fines 2011-07-20, 21:12
Copy link to this message
-
Re: question on lock recipe
Hi Will,
  We have done something similar with a custom realtime distributed
queue.  It's basically a Queue divided into channels,  with even hashing
on push, and a single consumer thread per channel.  We catch all
disconnect Exceptions and simply call worker.stop() on the worker that
is actually reading data from the queue.  The worker is a Runnable that
is submitted to the thread pool and checks if it should run each time
the pool runs the worker.

This occasionally results in workers that pause, then restart while our
ZK connections normalize, however this prevents us from consuming when
we aren't sure we have a lock.  As you said, our consumer checks if it
should be running with every iteration of the loop, however there is no
other way around this that I have found.

Todd

On Wed, 2011-07-20 at 17:03 -0400, Will Johnson wrote:

> The Lock recipe has a overview description of "Fully distributed locks that
> are globally synchronous, meaning at any snapshot in time no two clients
> think they hold the same lock."  We've implemented this pattern but we've
> run into an issue handling zookeeper errors that seem to violate the
> semantics of 'no two clients think they have the lock.'  for example:
>
> Thread1.Client1.lock();
> Thread2.Client2.lock();
>
> // client1 gets the lock so he starts some work
> Thread1.client1.doWork();
>
> // but now i get a session timeout
> // in the worst case it's because the doWork() method caused a full GC that
> took > sessionTimeout
> // my client then has to reconnect with a new session ID
> Thread1.client1.reconnect();
>
> But now my question is, how have people handled this case to notify
> Thread1.client1 that he is no longer holding the lock?  Without a lot of
> pedantic calls to Thread1.client1.doIStillHaveTheLock() inside the doWork()
> method it seems like 2 clients both think they have the lock.  Even if you
> make repeated calls to check the state of your lock you still have small
> windows of time where 2 clients are in the lock.  i could interrupt Thread1
> when reconnecting but if you're using the lock for multithreaded
> synchronization that won't help.
>
> I realize the limitations of zookeeper in this case but i also hope someone
> else has solved this problem intelligently before.
>
> - will