-Re: Zookeeper protocol weirdness and pure Python kazoo client
Ben Bangert 2012-08-31, 04:00
On Aug 30, 2012, at 5:10 PM, Henry Robinson <[EMAIL PROTECTED]> wrote:
> FWIW, this is my only reservation about a pure Python client - there isn't
> a spec, and three separate implementations that might have subtly different
> behaviours can be a nightmare to maintain. Ben - if you're able to turn any
> of your efforts towards documenting your observations about how the
> protocol actually works, that would be awesome.
Given that most ppl don't use the client directly, they program against 'helpers' such as Curator, I'm not sure thats such a big deal. The c style of completion callbacks already wasn't terribly useful with the fact that Python has its basic threading way, and many ppl use gevent for an async style (or twisted, also quite different).
The C binding falls apart badly with gevent, since gevent has issues talking across the C thread to the gevent hub. This necessitates a lot of very hacky Python code to try and bridge it, which isn't needed when its pure Python. And trying to finangle the zookeeper logging stream into proper Python logging was another set of hacks.
Right now, the code for the protocol handling looks very close to the Java client (Alan Cabrera did some great work on pookeeper which I refactored into kazoo). I don't see any reason to deviate too heavily, though of course there are some things that are nicer to implement for a more Pythonic feel. Keeping the 'feel' from a user perspective similar to the existing Zookeeper Programmers Guide is best just to avoid having to replicate docs.
On the happy note, Kazoo actually has a good amount of docs:
Unfortunately the C lib only has doc strings, which make the Java API docs look like a documentation heaven in comparison. And you can't even get to the C API docs online... I finally got tired of digging them out of the source and using doxygen so I copied them up to my own server here:
Some notes on the implementation...
Pure Python Zookeeper implementation:
- Approx. 280 lines of code for the socket response/request handling
- Approx. 250 lines of code for the request/response serialization/deserialization
- Anyone that knows Python reasonably well can trouble-shoot and contribute
- Can be used in gevent, Pypy, Jython (Jython doesn't even have a GIL!)
- Worst case... an exception bubbles up that you might have to catch
zkpython (Python binding to C lib):
- Approx. 1500 lines of C (Not including the C lib itself, which is another ~ 8000 lines of code)
- Anyone that knows and wants to read the C library *and* Python *and* the Python C binding tricks can trouble-shoot and contribute. And maybe the patches will actually be accepted and incorporated at some point...
- Only usable with CPython
- Worst case... Python segfaults
I really don't know many (hardly any) Python developers that know C well enough to debug it or dive into it. If their only Zookeeper experience is marred by bugs in the 'black box' of C, they'll move on to something else. Which saddens me cause I think Zookeeper is pretty awesome.
It took a week for us to figure out why our test suite failed on rare occasions. This wasn't helped by the fact that Zookeeper doesn't tell you if you supply a bad session id/password you don't get what you do in every app known to mankind (bad password or username).... it tells you SESSION_EXPIRED. Which is insanely confusing when you see your other client using that id/password happily connected still. We had to debug the Java server, use gdb and such to debug two C libs, etc. I really really don't want to ever repeat that experience, it was that bad. :)
On a side-note, why on Earth does Zookeeper not give you an AUTH_FAILED when you fail the auth for the session ID/password on connect?
I'd be happy to document and post more implementation details I've found about the actual protocol. I think it makes sense that powerful dynamic languages implement the protocol directly in a manner thats documented by the Zookeeper project rather than being crippled by using the C lib, and suffering segfaults as a result. Already for kazoo, I've been posting implementation details about how kazoo handles the C lib and bridging it to gevent/threads to help avoid common errors:
I can update that for protocol details, though it'd prolly be more useful to have a page on the Zookeeper site itself that discusses and documents the protocol and how it should be implemented for consistency.
Well, we've been maintaining a static zookeeper python library here:
We've been adding critical patches to it as we've found them on Jira and in our own tests. Each Jira bug ticket is linked to on there. Several of those are patched in the custom ubuntu compiled distro of the python-zookeeper bindings as well.
But obviously at some point it becomes futile. We'd like to use the read-only feature, but there's no hope of that getting into the Python binding since its still not in the C lib: https://issues.apache.org/jira/browse/ZOOKEEPER-827
There's been patches for that since 2010... and still its not resolved. That's pretty discouraging, and given the lack of online generated C docs there's definitely a "we don't care much about the C lib" message being broadcast. It's very obvious the Java client is what gets the support. Searching Jira for 'zkpython' and seeing the various unresolved memory leaks and segfault issues is also sad. Kazoo is already getting use at several companies, and we all want this thing to be solid, to not seg-fault our Python, and to be able to easily trouble-shoot it without going through C gymnastics. :)