Sorry it's taken so long to reply, the issue went away after I reassigned
partitions. Now it's back.

I haven't checked JMX, because the brokers and zookeeper have been
reporting the same ISR for several hours.

Some more details:

The cluster/topic has
  5 brokers (1, 4, 5, 7, 8)
  15 partitions (0...14)
  2 replicas

A single broker, 4, is the one missing from the ISR in every case. For
partitions where 4 is the leader (1, 6, 11), it is present in the ISR. For
partitions where 4 is not the leader (4, 8, 12), it is not present in the
ISR. Here's the output of my tool, showing assignment and ISR:
https://gist.github.com/also/8012383#file-from-brokers-txt

I haven't seen anything interesting in the logs, but I'm not entirely sure
what to look for. The cluster is currently in this state, and if it goes
like last time, this will persist until I reassign partitions.

What can I do in the meantime to track down the issue?

Thanks,

Ryan

On Thu, Dec 5, 2013 at 12:55 AM, Jun Rao <[EMAIL PROTECTED]> wrote:
 
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB