I'm working on some fault-tolerant consumer group. The idea is this, to
maximize the throughput of kafka. I request the metadata from broker and
create #{num of partition} consumers for each topic and distribute them on
different nodes. Moreover, there is mechanism to detect fail of any node
and restart it.
The problem is if I kill one of the consumer process, my program would
detect and relaunch a new consumer with same group id and client id. But it
would have some error(something like zookeeper entry doesn't exist, i
didn't keep the log) and never start.
I think the root cause is the zookeeper detect the fail of old consumer
process, before it delete the consumer, the new consumer is coming up and
communicate with the zookeeper, and at this time the zookeeper delete the
entry of that consumer, and the new consumer fail to be recognized by
The sequence is like this:
old consumer die -> zookeeper detect -> new consumer(same groupid clientid)
up -> zookeeper delete consumer -> new consumer find error and not
recognized by zookeeper

It's ok that I wont lose any data cause that data will go to other
consumer, but it's annoying that I want to keep consumer group balanced
after fail-over


NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB