Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Zookeeper >> mail # user >> The State of Python Zookeeper libraries and collaboration


Copy link to this message
-
The State of Python Zookeeper libraries and collaboration
It would seem that about 6 months or ago or so, there wasn't much out
there in terms of higher level Python libs for Zookeeper. There was the
Cloudera article on queues, and txzookeeper (which I'm sure many of us
not using twisted immediately ignored).

In the time since, several people including myself needed solutions
involving Zookeeper with Python and seeing nothing out there all
apparently began writing libraries (judging from the project timelines
in most cases). I've been collaborating with the author of zc.zk (Jim
Fulton) for awhile and we decided it'd make more sense to merge our
efforts. In this spirit I began contacting all the other developers to
gauge their interest and most have been interested.

I created a python-zk organization on GitHub to be the home for this
effort and moved over the zc.zk library (which people apparently had a
hard time locating), along with the fairly widely used staticly compiled
Python Zookeeper binding.

https://github.com/python-zk

Next up is to create the new merged core which I plan on basing mostly
around the cleanest implementation I have seen so far (which also
happens to be one of the only gevent compatible ones), kazoo. I've
talked with the primary author of Kazoo, and the name may remain with
the new merged package or it may get a new name if that doesn't work.
I'm not terribly tied to names as much as I am to solid, well tested,
well documented working code... but having catchy names does seem to help.

I'm currently working on this full-time, so I expect it to be in a
usable state in a week or so (hopefully not too optimistic). If you're
interested in helping out, the more the better, please feel free to
e-mail me directly or respond here.

This stuff is complex, it needs many eyes on it and lots of code review.

This hopefully explains why I'm so interested in having a single Python
Zookeeper library along similar caliber to Netflix's Curator that has:
- Very thorough unit/integration tests (100% coverage minimum)
- Cleanly handles connection loss
- Works under gevent or threaded/blocking
- Very well documented (API docs and narrative)
- Implements all the Zookeeper recipes
- Service Discovery/Management
- Higher level utility functions for common Zookeeper tasks

In the mean-time, here is a summary of my research efforts and code
review (if something isn't accurate, please feel free to correct).

Please don't take this as a critique, I'm just trying to document what
is out there for my own reference on merging and hopefully so other
people coming along don't continue to replicate this. :)
gevent-zookeeper
    - https://github.com/jrydberg/gevent-zookeeper/

    - Works under gevent
    - No tests
    - No documentation

kazoo
    - https://github.com/nimbusproject/kazoo

    - Resilient Client
    - Basic Lock (Uses UUID properly)
    - Some Tests (Integrated)
    - No documentation (doc strings only)
    - Works under gevent

pykeeper
    - https://github.com/nkvoll/pykeeper

    - Higher level client (not resiliant to errors)
    - Documentation
    - Some tests (Integrated)

txzookeeper
    - JuJu Team
    - https://launchpad.net/txzookeeper

    - Resilient Client
    - Doesn't handle create node edge-case
    - Basic Lock (open bug filed to handle the UUID bit)
    - Queue, ReliableQueue, SerializedQueue
    - No documentation (doc strings only)
    - Usable only from twisted
    - Well tested (Integrated)

twitter zookeeper lib
    -
https://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper

    - Resilient Client
    - Handles create node edge-case
    - Service Registration/Discovery
    - Some documentation
    - Well tested (Integrated)
    - Tied to a lot of twitter commons code

zkpython (improvements to a fork of the official bindings)
    - https://github.com/duncf/zkpython/

    - Resilient Client
    - Basic Lock (Using unique id rather than UUID)
    - Handles create node edge-case
    - Some Tests (Integrated)
    - No additional docs

zc.zk
    - https://github.com/python-zk/zc.zk

    - Non-resilient Client (reconnects must be handled)
    - Higher level automatic watch functionality
    - Service Registration/Discovery
    - Well tested (Unit and Integration tests)
    - Documented (on usage, source code is missing doc strings)

zktools
    - https://github.com/mozilla-services/zktools

    - Relies on zc.zk
    - Shared Read/Write Locks
    - AsyncLock
    - Revokable Locks
    - Tests (Integrated)

zoop
    - https://github.com/davidmiller/zoop

    - Doesn't handle create node edge-case
    - Doesn't handle retryable exceptions
    - Revokable Lock (Doesn't handle create node edge-case, uses a permanent
                      node instead of ephemeral)
    - Tested (Unit tests via ZK mocks)
    - Well Documented (doc strings and narrative docs)
Ben Bangert
(ben@ || http://) groovie.org
+
Mark Gius 2012-05-17, 16:24
+
Ben Bangert 2012-05-17, 21:41
+
Alan D. Cabrera 2012-05-18, 03:24
+
Mark Gius 2012-05-18, 17:11
+
Alan D. Cabrera 2012-06-03, 17:44
+
Patrick Hunt 2012-05-17, 22:46
+
Martin Kou 2012-05-20, 19:29
+
Ben Bangert 2012-05-21, 19:58
+
Mark Gius 2012-05-21, 20:23
+
Duncan Findlay 2012-05-22, 04:17
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB