I'm wondering if a simple change could be to not log full stack traces for
simple things like "Connection refused", etc.  Seems it would be fine to
just log the exception message in such cases.

Also, the log levels could be tuned, such that things logged as ERROR
indicate that all possible retries have been attempted, rather than having
it be an ERROR for each step of the retry/failover process.  Thus, for a
redundant, clustered service, it should be considered normal that single
nodes will be unavailable (such as when we're doing a rolling restart of
the cluster, etc.).  It should only be an ERROR if all brokers/all replicas
are unavailable, etc.  This way, we can selectively set our log level to
ERROR, and have it be useful.

Does this make sense?  If so, I can file a Jira along these lines....

On Mon, Sep 23, 2013 at 9:51 PM, Neha Narkhede <[EMAIL PROTECTED]>wrote:
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB