Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Threaded View
Zookeeper >> mail # dev >> retreat from zookeeper


Copy link to this message
-
retreat from zookeeper
Hi,

as you may have noticed, I haven't been active in the ZooKeeper project
anymore for a couple of months. I'm a full time student again since march so
that any further activity in Hadoop/ZooKeeper would need to be auto-motivated.

Since I don't want to just fade away and I'll still give a talk about
ZooKeeper on the BerlinBuzzWords conf (Berlin, june 6/7), I listed the reasons
why I wouldn't like to work on the current ZooKeeper code base anymore.

I plan the following structure for my talk:

1) theoretical model / protocol of ZooKeeper
2) practical applications, projects using ZooKeeper
3) shortcomings of the current ZooKeeper code base

A tentative brain dump of part three is listed below. I appreciate any
comments that could help me to give a balanced presentation of the ZooKeeper
project.

If I'd need a ZooKeeper implementation right now I'd probably do a minimal-
feature rewrite in Scala + Akka. I do appreciate ZooKeeper as an invaluable
proof-of-concept implementation and pioneer. But as in american history there
should come others after the pioneers that don't look like Clint Eastwood
anymore and build more tidy things.

The list:

* The code is tightly coupled
* most so called "Unit-Tests" are actualy integration tests. They run the
whole application and test one specific functionality.

* no uniform configuration: command line parameters, system properties,
configuration file (java properties)
* configuration properties copied to static class members

* feature bloat on fragile foundation: e.g. chroot + automatic resubscribtion
does not work

* implementation unlike specification: allowed characters in path

* still on ant instead of maven (depends how you see ant vs. maven)

* circular object dependencies (e.g. ZooKeeper <-> ClientCnxn)

* methods with +100 lines of code and nested conditions depth well over 5

* general attitude against refactoring, no knowledge or appreciation of
"effective java" (Josh Bloch) or "clean code" (Robert C. Martin)

* magic numbers instead of enum

* still bound to inline copy of jute (HadoopIO, avro predecessor)
* even hand coded (de)serialization in leader election

* no client-only jar. Every client gets the full server code.

* unhandy API triggered (at least) two client API wrappers: zkClient, cages

* insane amounts of code duplication

* horrible, fragile thread programming: plenty of "XYZ extends Threads"
instead of
  - implements runnable
  - or better: executor framework
  - or much better: actors (see Akka)
  -> leads to fear of refactoring, because nobody understands all
synchronization needs.

Best regards,

Thomas Koch, http://www.koch.ro
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB