|
Ben Bangert
2012-05-17, 00:50
Mark Gius
2012-05-17, 16:24
Ben Bangert
2012-05-17, 21:41
Alan D. Cabrera
2012-05-18, 03:24
Mark Gius
2012-05-18, 17:11
Alan D. Cabrera
2012-06-03, 17:44
Patrick Hunt
2012-05-17, 22:46
Martin Kou
2012-05-20, 19:29
Ben Bangert
2012-05-21, 19:58
Mark Gius
2012-05-21, 20:23
Duncan Findlay
2012-05-22, 04:17
|
-
The State of Python Zookeeper libraries and collaborationBen Bangert 2012-05-17, 00:50
It would seem that about 6 months or ago or so, there wasn't much out
there in terms of higher level Python libs for Zookeeper. There was the Cloudera article on queues, and txzookeeper (which I'm sure many of us not using twisted immediately ignored). In the time since, several people including myself needed solutions involving Zookeeper with Python and seeing nothing out there all apparently began writing libraries (judging from the project timelines in most cases). I've been collaborating with the author of zc.zk (Jim Fulton) for awhile and we decided it'd make more sense to merge our efforts. In this spirit I began contacting all the other developers to gauge their interest and most have been interested. I created a python-zk organization on GitHub to be the home for this effort and moved over the zc.zk library (which people apparently had a hard time locating), along with the fairly widely used staticly compiled Python Zookeeper binding. https://github.com/python-zk Next up is to create the new merged core which I plan on basing mostly around the cleanest implementation I have seen so far (which also happens to be one of the only gevent compatible ones), kazoo. I've talked with the primary author of Kazoo, and the name may remain with the new merged package or it may get a new name if that doesn't work. I'm not terribly tied to names as much as I am to solid, well tested, well documented working code... but having catchy names does seem to help. I'm currently working on this full-time, so I expect it to be in a usable state in a week or so (hopefully not too optimistic). If you're interested in helping out, the more the better, please feel free to e-mail me directly or respond here. This stuff is complex, it needs many eyes on it and lots of code review. This hopefully explains why I'm so interested in having a single Python Zookeeper library along similar caliber to Netflix's Curator that has: - Very thorough unit/integration tests (100% coverage minimum) - Cleanly handles connection loss - Works under gevent or threaded/blocking - Very well documented (API docs and narrative) - Implements all the Zookeeper recipes - Service Discovery/Management - Higher level utility functions for common Zookeeper tasks In the mean-time, here is a summary of my research efforts and code review (if something isn't accurate, please feel free to correct). Please don't take this as a critique, I'm just trying to document what is out there for my own reference on merging and hopefully so other people coming along don't continue to replicate this. :) gevent-zookeeper - https://github.com/jrydberg/gevent-zookeeper/ - Works under gevent - No tests - No documentation kazoo - https://github.com/nimbusproject/kazoo - Resilient Client - Basic Lock (Uses UUID properly) - Some Tests (Integrated) - No documentation (doc strings only) - Works under gevent pykeeper - https://github.com/nkvoll/pykeeper - Higher level client (not resiliant to errors) - Documentation - Some tests (Integrated) txzookeeper - JuJu Team - https://launchpad.net/txzookeeper - Resilient Client - Doesn't handle create node edge-case - Basic Lock (open bug filed to handle the UUID bit) - Queue, ReliableQueue, SerializedQueue - No documentation (doc strings only) - Usable only from twisted - Well tested (Integrated) twitter zookeeper lib - https://github.com/twitter/commons/tree/master/src/python/twitter/common/zookeeper - Resilient Client - Handles create node edge-case - Service Registration/Discovery - Some documentation - Well tested (Integrated) - Tied to a lot of twitter commons code zkpython (improvements to a fork of the official bindings) - https://github.com/duncf/zkpython/ - Resilient Client - Basic Lock (Using unique id rather than UUID) - Handles create node edge-case - Some Tests (Integrated) - No additional docs zc.zk - https://github.com/python-zk/zc.zk - Non-resilient Client (reconnects must be handled) - Higher level automatic watch functionality - Service Registration/Discovery - Well tested (Unit and Integration tests) - Documented (on usage, source code is missing doc strings) zktools - https://github.com/mozilla-services/zktools - Relies on zc.zk - Shared Read/Write Locks - AsyncLock - Revokable Locks - Tests (Integrated) zoop - https://github.com/davidmiller/zoop - Doesn't handle create node edge-case - Doesn't handle retryable exceptions - Revokable Lock (Doesn't handle create node edge-case, uses a permanent node instead of ephemeral) - Tested (Unit tests via ZK mocks) - Well Documented (doc strings and narrative docs) Ben Bangert (ben@ || http://) groovie.org +
Ben Bangert 2012-05-17, 00:50
-
Re: The State of Python Zookeeper libraries and collaborationMark Gius 2012-05-17, 16:24
Are you planning on writing a pure-python client (does not call out to the
C bindings via zkpython) or are you planning on writing a solid wrapper around the C bindings. Implementing a pure-python client would go a long way towards making various green thread frameworks work without having to jump through hoops. I think we'd have to add support to Jute so that it would generate python data classes kind of like it does now with Java and C. Assuming you go with a wrapper around the C bindings, I would suggest you take a look at something called "xthread.py", which was a thread synchronization primitives library that a guy proposed to the eventlet project a while back and provides Lock, Notify, etc etc which are safe to use and notify between real threads and green threads. It gives a safe way to deal with sending data and doing proper locks without having to worry about calling out to "green" things from within non greened contexts (such as the zk callback functions). It's eventlet specific, but the concepts and probably fair amount of the code can probably be adapted and use. Or pure-python. That works too. :D Mark On Wed, May 16, 2012 at 5:50 PM, Ben Bangert <[EMAIL PROTECTED]> wrote: > It would seem that about 6 months or ago or so, there wasn't much out > there in terms of higher level Python libs for Zookeeper. There was the > Cloudera article on queues, and txzookeeper (which I'm sure many of us > not using twisted immediately ignored). > > In the time since, several people including myself needed solutions > involving Zookeeper with Python and seeing nothing out there all > apparently began writing libraries (judging from the project timelines > in most cases). I've been collaborating with the author of zc.zk (Jim > Fulton) for awhile and we decided it'd make more sense to merge our > efforts. In this spirit I began contacting all the other developers to > gauge their interest and most have been interested. > > I created a python-zk organization on GitHub to be the home for this > effort and moved over the zc.zk library (which people apparently had a > hard time locating), along with the fairly widely used staticly compiled > Python Zookeeper binding. > > https://github.com/python-zk > > Next up is to create the new merged core which I plan on basing mostly > around the cleanest implementation I have seen so far (which also > happens to be one of the only gevent compatible ones), kazoo. I've > talked with the primary author of Kazoo, and the name may remain with > the new merged package or it may get a new name if that doesn't work. > I'm not terribly tied to names as much as I am to solid, well tested, > well documented working code... but having catchy names does seem to help. > > I'm currently working on this full-time, so I expect it to be in a > usable state in a week or so (hopefully not too optimistic). If you're > interested in helping out, the more the better, please feel free to > e-mail me directly or respond here. > > This stuff is complex, it needs many eyes on it and lots of code review. > > This hopefully explains why I'm so interested in having a single Python > Zookeeper library along similar caliber to Netflix's Curator that has: > - Very thorough unit/integration tests (100% coverage minimum) > - Cleanly handles connection loss > - Works under gevent or threaded/blocking > - Very well documented (API docs and narrative) > - Implements all the Zookeeper recipes > - Service Discovery/Management > - Higher level utility functions for common Zookeeper tasks > > In the mean-time, here is a summary of my research efforts and code > review (if something isn't accurate, please feel free to correct). > > Please don't take this as a critique, I'm just trying to document what > is out there for my own reference on merging and hopefully so other > people coming along don't continue to replicate this. :) > > > gevent-zookeeper > - https://github.com/jrydberg/gevent-zookeeper/ > > - Works under gevent > - No tests +
Mark Gius 2012-05-17, 16:24
-
Re: The State of Python Zookeeper libraries and collaborationBen Bangert 2012-05-17, 21:41
On 5/17/12 9:24 AM, Mark Gius wrote:
> Are you planning on writing a pure-python client (does not call out to the > C bindings via zkpython) or are you planning on writing a solid wrapper > around the C bindings. Implementing a pure-python client would go a long > way towards making various green thread frameworks work without having to > jump through hoops. I think we'd have to add support to Jute so that it > would generate python data classes kind of like it does now with Java and C. > > Assuming you go with a wrapper around the C bindings, I would suggest you > take a look at something called "xthread.py", which was a thread > synchronization primitives library that a guy proposed to the eventlet > project a while back and provides Lock, Notify, etc etc which are safe to > use and notify between real threads and green threads. It gives a safe way > to deal with sending data and doing proper locks without having to worry > about calling out to "green" things from within non greened contexts (such > as the zk callback functions). It's eventlet specific, but the concepts > and probably fair amount of the code can probably be adapted and use. > > Or pure-python. That works too. :D I looked at writing a base zookeeper replacement that includes the higher level API's and utilizes ctypes to talk to an install zookeeper C binding rather than using the Python C binding. This has the advantage of working in PyPy, however it was quite a pain and probably has some slower performance than the Python C binding. This would be pure Python, but not quite in the way you're referring to as its not talking directly to Zookeeper using pure Python, but still using the C API. I'm mainly looking at having a higher level API that makes it easier to use Zookeeper in a less error-prone manner. Like Netflix's Curator, only with a Pythonic API since their API is fairly heavily grounded in Java limitations. So it'll have convenience methods, a consistent API that's usable under greenlets or threaded code, all the recipes, and well tested and documented. Lots of things using Zookeeper make the notion (for right or wrong) that watch events are executed sequentially (the C API does this for example as does the Java one AFAIK). To handle this my plan when using greenlets was to immediately spawn a greenlet watch processor during the ZK client initialization that would work off a normal non-gevent-patched Queue object, and the callbacks will drop a lambda onto the queue from the ZK thread. This ensures even in an async environment that by default all watch events are processed in the same order the ZK client receives them (a watch func could of course spawn a greenlet for itself, but at that point its already safely in the 'green context'). The Kazoo author (David LaBissoniere) has written a small test script that verifies this approach appears to work. I'll want to test it under heavy load of course but it seems like a rather safe and sane approach. It also avoids a lot of the hairier pipe code that tries and shuttles things safely back and forth. If someone comes up with a pure Python client to Zookeeper, I'd be happy to work on supporting that as well but its a bit beyond the level of direct involvement I can provide. -- Ben Bangert (ben@ || http://) groovie.org +
Ben Bangert 2012-05-17, 21:41
-
Re: The State of Python Zookeeper libraries and collaborationAlan D. Cabrera 2012-05-18, 03:24
On May 17, 2012, at 2:41 PM, Ben Bangert wrote: > If someone comes up with a pure Python client to Zookeeper, I'd be happy > to work on supporting that as well but its a bit beyond the level of > direct involvement I can provide. I'm still goofing around with a pure Python client. I've modified the Jute compiler to also generate the requests and response objects in Python: http://pastie.org/3928662 It seems to be communicating w/ the Zookeeper instances at work perfectly fine but we don't use SASL. I'm a novice Python programmer and this is a simple exercise for me to cut my teeth on. Regards, Alan +
Alan D. Cabrera 2012-05-18, 03:24
-
Re: The State of Python Zookeeper libraries and collaborationMark Gius 2012-05-18, 17:11
Are your patches to Jute available somewhere?
Mark On Thu, May 17, 2012 at 8:24 PM, Alan D. Cabrera <[EMAIL PROTECTED]>wrote: > > On May 17, 2012, at 2:41 PM, Ben Bangert wrote: > > > If someone comes up with a pure Python client to Zookeeper, I'd be happy > > to work on supporting that as well but its a bit beyond the level of > > direct involvement I can provide. > > I'm still goofing around with a pure Python client. I've modified the > Jute compiler to also generate the requests and response objects in Python: > > http://pastie.org/3928662 > > It seems to be communicating w/ the Zookeeper instances at work perfectly > fine but we don't use SASL. > > I'm a novice Python programmer and this is a simple exercise for me to cut > my teeth on. > > > Regards, > Alan > > +
Mark Gius 2012-05-18, 17:11
-
Re: The State of Python Zookeeper libraries and collaborationAlan D. Cabrera 2012-06-03, 17:44
My Jute changes are kindof hacky since not all the the types in Jute are used; I made the bare minimum of changes to get my requests and responses in Python. I also had to make a number of hand changes to the generated Python classes, e.g. I added the request header type codes to the packets.
My code, which is a very rough draft, can be found at https://github.com/maguro/pookeeper BTW, I'm happy to take suggestions for a good name. :) As you can see I've only tested a small subset of requests. Regards, Alan On May 18, 2012, at 10:11 AM, Mark Gius wrote: > Are your patches to Jute available somewhere? > > Mark > > On Thu, May 17, 2012 at 8:24 PM, Alan D. Cabrera <[EMAIL PROTECTED]>wrote: > >> >> On May 17, 2012, at 2:41 PM, Ben Bangert wrote: >> >>> If someone comes up with a pure Python client to Zookeeper, I'd be happy >>> to work on supporting that as well but its a bit beyond the level of >>> direct involvement I can provide. >> >> I'm still goofing around with a pure Python client. I've modified the >> Jute compiler to also generate the requests and response objects in Python: >> >> http://pastie.org/3928662 >> >> It seems to be communicating w/ the Zookeeper instances at work perfectly >> fine but we don't use SASL. >> >> I'm a novice Python programmer and this is a simple exercise for me to cut >> my teeth on. >> >> >> Regards, >> Alan >> >> +
Alan D. Cabrera 2012-06-03, 17:44
-
Re: The State of Python Zookeeper libraries and collaborationPatrick Hunt 2012-05-17, 22:46
Ben this is cool -- please keep us posted on your progress!
Given the research you've done please consider updating the client binding wiki page, in particular list your project. https://cwiki.apache.org/confluence/display/ZOOKEEPER/ZKClientBindings Regards, Patrick On Thu, May 17, 2012 at 2:41 PM, Ben Bangert <[EMAIL PROTECTED]> wrote: > On 5/17/12 9:24 AM, Mark Gius wrote: >> Are you planning on writing a pure-python client (does not call out to the >> C bindings via zkpython) or are you planning on writing a solid wrapper >> around the C bindings. Implementing a pure-python client would go a long >> way towards making various green thread frameworks work without having to >> jump through hoops. I think we'd have to add support to Jute so that it >> would generate python data classes kind of like it does now with Java and C. >> >> Assuming you go with a wrapper around the C bindings, I would suggest you >> take a look at something called "xthread.py", which was a thread >> synchronization primitives library that a guy proposed to the eventlet >> project a while back and provides Lock, Notify, etc etc which are safe to >> use and notify between real threads and green threads. It gives a safe way >> to deal with sending data and doing proper locks without having to worry >> about calling out to "green" things from within non greened contexts (such >> as the zk callback functions). It's eventlet specific, but the concepts >> and probably fair amount of the code can probably be adapted and use. >> >> Or pure-python. That works too. :D > > I looked at writing a base zookeeper replacement that includes the > higher level API's and utilizes ctypes to talk to an install zookeeper C > binding rather than using the Python C binding. This has the advantage > of working in PyPy, however it was quite a pain and probably has some > slower performance than the Python C binding. This would be pure Python, > but not quite in the way you're referring to as its not talking directly > to Zookeeper using pure Python, but still using the C API. > > I'm mainly looking at having a higher level API that makes it easier to > use Zookeeper in a less error-prone manner. Like Netflix's Curator, only > with a Pythonic API since their API is fairly heavily grounded in Java > limitations. So it'll have convenience methods, a consistent API that's > usable under greenlets or threaded code, all the recipes, and well > tested and documented. > > Lots of things using Zookeeper make the notion (for right or wrong) that > watch events are executed sequentially (the C API does this for example > as does the Java one AFAIK). > > To handle this my plan when using greenlets was to immediately spawn a > greenlet watch processor during the ZK client initialization that would > work off a normal non-gevent-patched Queue object, and the callbacks > will drop a lambda onto the queue from the ZK thread. This ensures even > in an async environment that by default all watch events are processed > in the same order the ZK client receives them (a watch func could of > course spawn a greenlet for itself, but at that point its already safely > in the 'green context'). > > The Kazoo author (David LaBissoniere) has written a small test script > that verifies this approach appears to work. I'll want to test it under > heavy load of course but it seems like a rather safe and sane approach. > It also avoids a lot of the hairier pipe code that tries and shuttles > things safely back and forth. > > If someone comes up with a pure Python client to Zookeeper, I'd be happy > to work on supporting that as well but its a bit beyond the level of > direct involvement I can provide. > > -- > Ben Bangert > (ben@ || http://) groovie.org > +
Patrick Hunt 2012-05-17, 22:46
-
Re: The State of Python Zookeeper libraries and collaborationMartin Kou 2012-05-20, 19:29
A pure Python implementation shouldn't be needed for gevent. It should be
sufficient to run the Zookeeper client in async mode in a background thread (with mt or st.. personally I prefer st), and then use an eventfd() or a pipe() to notify gevent. In fact, if you look into gevent's thread pool library - they're using libev's async signal which is also based on eventfd() and pipe(). So it's not a new trick. Best Regards, Martin Kou On Thu, May 17, 2012 at 9:24 AM, Mark Gius <[EMAIL PROTECTED]> wrote: > Are you planning on writing a pure-python client (does not call out to the > C bindings via zkpython) or are you planning on writing a solid wrapper > around the C bindings. Implementing a pure-python client would go a long > way towards making various green thread frameworks work without having to > jump through hoops. I think we'd have to add support to Jute so that it > would generate python data classes kind of like it does now with Java and > C. > > Assuming you go with a wrapper around the C bindings, I would suggest you > take a look at something called "xthread.py", which was a thread > synchronization primitives library that a guy proposed to the eventlet > project a while back and provides Lock, Notify, etc etc which are safe to > use and notify between real threads and green threads. It gives a safe way > to deal with sending data and doing proper locks without having to worry > about calling out to "green" things from within non greened contexts (such > as the zk callback functions). It's eventlet specific, but the concepts > and probably fair amount of the code can probably be adapted and use. > > Or pure-python. That works too. :D > > Mark > > On Wed, May 16, 2012 at 5:50 PM, Ben Bangert <[EMAIL PROTECTED]> wrote: > > > It would seem that about 6 months or ago or so, there wasn't much out > > there in terms of higher level Python libs for Zookeeper. There was the > > Cloudera article on queues, and txzookeeper (which I'm sure many of us > > not using twisted immediately ignored). > > > > In the time since, several people including myself needed solutions > > involving Zookeeper with Python and seeing nothing out there all > > apparently began writing libraries (judging from the project timelines > > in most cases). I've been collaborating with the author of zc.zk (Jim > > Fulton) for awhile and we decided it'd make more sense to merge our > > efforts. In this spirit I began contacting all the other developers to > > gauge their interest and most have been interested. > > > > I created a python-zk organization on GitHub to be the home for this > > effort and moved over the zc.zk library (which people apparently had a > > hard time locating), along with the fairly widely used staticly compiled > > Python Zookeeper binding. > > > > https://github.com/python-zk > > > > Next up is to create the new merged core which I plan on basing mostly > > around the cleanest implementation I have seen so far (which also > > happens to be one of the only gevent compatible ones), kazoo. I've > > talked with the primary author of Kazoo, and the name may remain with > > the new merged package or it may get a new name if that doesn't work. > > I'm not terribly tied to names as much as I am to solid, well tested, > > well documented working code... but having catchy names does seem to > help. > > > > I'm currently working on this full-time, so I expect it to be in a > > usable state in a week or so (hopefully not too optimistic). If you're > > interested in helping out, the more the better, please feel free to > > e-mail me directly or respond here. > > > > This stuff is complex, it needs many eyes on it and lots of code review. > > > > This hopefully explains why I'm so interested in having a single Python > > Zookeeper library along similar caliber to Netflix's Curator that has: > > - Very thorough unit/integration tests (100% coverage minimum) > > - Cleanly handles connection loss > > - Works under gevent or threaded/blocking > > - Very well documented (API docs and narrative) +
Martin Kou 2012-05-20, 19:29
-
Re: The State of Python Zookeeper libraries and collaborationBen Bangert 2012-05-21, 19:58
On 5/20/12 12:29 PM, Martin Kou wrote:
> A pure Python implementation shouldn't be needed for gevent. It should be > sufficient to run the Zookeeper client in async mode in a background thread > (with mt or st.. personally I prefer st), and then use an eventfd() or a > pipe() to notify gevent. > > In fact, if you look into gevent's thread pool library - they're using > libev's async signal which is also based on eventfd() and pipe(). So it's > not a new trick. The current kazoo implementation uses something like this to pass off the events. Also, under a test script, it *seems* to work if the ZK thread plants a lambda onto a gevent Queue for a gevent greenlet worker to then execute. Here's an example script testing it (when the use_gevent is set to False, the other greenlet fails to print as its using a blocking Queue of course): http://paste.ofcode.org/37qxZAYzqYe43zbuFa3DqNf I haven't tried this out with the Python ZK lib, do you know offhand if this is going to run into any issue with gevent having a problem with items being added to its queue from the other thread? I'd like to avoid the extra complexity of the pipe's and such if this is going to work (which the test script seems to). I was also going with this approach to ensure sequential watch execution when used with gevent (and the watch func could spawn itself as a new greenlet if it needs to yield, etc.) -- Ben Bangert (ben@ || http://) groovie.org +
Ben Bangert 2012-05-21, 19:58
-
Re: The State of Python Zookeeper libraries and collaborationMark Gius 2012-05-21, 20:23
On Mon, May 21, 2012 at 12:58 PM, Ben Bangert <[EMAIL PROTECTED]> wrote:
> On 5/20/12 12:29 PM, Martin Kou wrote: > > A pure Python implementation shouldn't be needed for gevent. It should be > > sufficient to run the Zookeeper client in async mode in a background > thread > > (with mt or st.. personally I prefer st), and then use an eventfd() or a > > pipe() to notify gevent. > > > > In fact, if you look into gevent's thread pool library - they're using > > libev's async signal which is also based on eventfd() and pipe(). So it's > > not a new trick. > > The current kazoo implementation uses something like this to pass off > the events. Also, under a test script, it *seems* to work if the ZK > thread plants a lambda onto a gevent Queue for a gevent greenlet worker > to then execute. > > Here's an example script testing it (when the use_gevent is set to > False, the other greenlet fails to print as its using a blocking Queue > of course): > http://paste.ofcode.org/37qxZAYzqYe43zbuFa3DqNf > > I haven't tried this out with the Python ZK lib, do you know offhand if > this is going to run into any issue with gevent having a problem with > items being added to its queue from the other thread? > > I'd like to avoid the extra complexity of the pipe's and such if this is > going to work (which the test script seems to). I was also going with > this approach to ensure sequential watch execution when used with gevent > (and the watch func could spawn itself as a new greenlet if it needs to > yield, etc.) > > -- > Ben Bangert > (ben@ || http://) groovie.org > > +1 on avoiding complexity, and wrangling python_zk so that it behaves with eventlet/gevent is very complex. The advantage of a pure-python client is that we don't have to play these tricks to deal with the fact that the multithreaded library produces Threads which are not Green safe, causing us to play games to ship code back into a green thread. The other trick is that there are more than one "green thread" libraries out there. gevent, eventlet, twisted, etc. A pure python implementation means that we don't have to special case for various libraries. eventlet just has to monkey patch the standard library, same with whatever gevent or twisted do. I'm happy to help with a client that interfaces with the C bindings, but I think that a pure-python client would be cleaner and support a wider variety of deployments. Mark +
Mark Gius 2012-05-21, 20:23
-
Re: The State of Python Zookeeper libraries and collaborationDuncan Findlay 2012-05-22, 04:17
On May 17, 2012, at 9:24 AM, Mark Gius wrote: > Are you planning on writing a pure-python client (does not call out to the > C bindings via zkpython) or are you planning on writing a solid wrapper > around the C bindings. Implementing a pure-python client would go a long > way towards making various green thread frameworks work without having to > jump through hoops. I think we'd have to add support to Jute so that it > would generate python data classes kind of like it does now with Java and C. One nice thing about the C bindings is that the communication with ZooKeeper happens in a separate C-thread. We have a number of applications that like to chew through all the available CPU. In these applications it's impossible to ensure that any individual Python thread gets scheduled frequently enough (e.g. to send PINGs to the server). So, personally I'd rather use the C bindings. ;-) Duncan +
Duncan Findlay 2012-05-22, 04:17
|