Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper, mail # dev - retreat from zookeeper


Copy link to this message
-
retreat from zookeeper
Thomas Koch 2011-05-19, 09:21
Hi,

as you may have noticed, I haven't been active in the ZooKeeper project
anymore for a couple of months. I'm a full time student again since march so
that any further activity in Hadoop/ZooKeeper would need to be auto-motivated.

Since I don't want to just fade away and I'll still give a talk about
ZooKeeper on the BerlinBuzzWords conf (Berlin, june 6/7), I listed the reasons
why I wouldn't like to work on the current ZooKeeper code base anymore.

I plan the following structure for my talk:

1) theoretical model / protocol of ZooKeeper
2) practical applications, projects using ZooKeeper
3) shortcomings of the current ZooKeeper code base

A tentative brain dump of part three is listed below. I appreciate any
comments that could help me to give a balanced presentation of the ZooKeeper
project.

If I'd need a ZooKeeper implementation right now I'd probably do a minimal-
feature rewrite in Scala + Akka. I do appreciate ZooKeeper as an invaluable
proof-of-concept implementation and pioneer. But as in american history there
should come others after the pioneers that don't look like Clint Eastwood
anymore and build more tidy things.

The list:

* The code is tightly coupled
* most so called "Unit-Tests" are actualy integration tests. They run the
whole application and test one specific functionality.

* no uniform configuration: command line parameters, system properties,
configuration file (java properties)
* configuration properties copied to static class members

* feature bloat on fragile foundation: e.g. chroot + automatic resubscribtion
does not work

* implementation unlike specification: allowed characters in path

* still on ant instead of maven (depends how you see ant vs. maven)

* circular object dependencies (e.g. ZooKeeper <-> ClientCnxn)

* methods with +100 lines of code and nested conditions depth well over 5

* general attitude against refactoring, no knowledge or appreciation of
"effective java" (Josh Bloch) or "clean code" (Robert C. Martin)

* magic numbers instead of enum

* still bound to inline copy of jute (HadoopIO, avro predecessor)
* even hand coded (de)serialization in leader election

* no client-only jar. Every client gets the full server code.

* unhandy API triggered (at least) two client API wrappers: zkClient, cages

* insane amounts of code duplication

* horrible, fragile thread programming: plenty of "XYZ extends Threads"
instead of
  - implements runnable
  - or better: executor framework
  - or much better: actors (see Akka)
  -> leads to fear of refactoring, because nobody understands all
synchronization needs.

Best regards,

Thomas Koch, http://www.koch.ro