Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # user - The State of Python Zookeeper libraries and collaboration


Copy link to this message
-
Re: The State of Python Zookeeper libraries and collaboration
Ben Bangert 2012-05-17, 21:41
On 5/17/12 9:24 AM, Mark Gius wrote:
> Are you planning on writing a pure-python client (does not call out to the
> C bindings via zkpython) or are you planning on writing a solid wrapper
> around the C bindings. Implementing a pure-python client would go a long
> way towards making various green thread frameworks work without having to
> jump through hoops.  I think we'd have to add support to Jute so that it
> would generate python data classes kind of like it does now with Java and C.
>
> Assuming you go with a wrapper around the C bindings, I would suggest you
> take a look at something called "xthread.py", which was a thread
> synchronization primitives library that a guy proposed to the eventlet
> project a while back and provides Lock, Notify, etc etc which are safe to
> use and notify between real threads and green threads.  It gives a safe way
> to deal with sending data and doing proper locks without having to worry
> about calling out to "green" things from within non greened contexts (such
> as the zk callback functions).  It's eventlet specific, but the concepts
> and probably fair amount of the code can probably be adapted and use.
>
> Or pure-python.  That works too. :D

I looked at writing a base zookeeper replacement that includes the
higher level API's and utilizes ctypes to talk to an install zookeeper C
binding rather than using the Python C binding. This has the advantage
of working in PyPy, however it was quite a pain and probably has some
slower performance than the Python C binding. This would be pure Python,
but not quite in the way you're referring to as its not talking directly
to Zookeeper using pure Python, but still using the C API.

I'm mainly looking at having a higher level API that makes it easier to
use Zookeeper in a less error-prone manner. Like Netflix's Curator, only
with a Pythonic API since their API is fairly heavily grounded in Java
limitations. So it'll have convenience methods, a consistent API that's
usable under greenlets or threaded code, all the recipes, and well
tested and documented.

Lots of things using Zookeeper make the notion (for right or wrong) that
watch events are executed sequentially (the C API does this for example
as does the Java one AFAIK).

To handle this my plan when using greenlets was to immediately spawn a
greenlet watch processor during the ZK client initialization that would
work off a normal non-gevent-patched Queue object, and the callbacks
will drop a lambda onto the queue from the ZK thread. This ensures even
in an async environment that by default all watch events are processed
in the same order the ZK client receives them (a watch func could of
course spawn a greenlet for itself, but at that point its already safely
in the 'green context').

The Kazoo author (David LaBissoniere) has written a small test script
that verifies this approach appears to work. I'll want to test it under
heavy load of course but it seems like a rather safe and sane approach.
It also avoids a lot of the hairier pipe code that tries and shuttles
things safely back and forth.

If someone comes up with a pure Python client to Zookeeper, I'd be happy
to work on supporting that as well but its a bit beyond the level of
direct involvement I can provide.

--
Ben Bangert
(ben@ || http://) groovie.org