Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> anecdotal uptime and service monitoring

Copy link to this message
Re: anecdotal uptime and service monitoring
At LinkedIn, the most common failure of a Kafka broker is when we have to
deploy new Kafka code/config. Otherwise, the broker can be up for a long
time (e..g, months). It woud be good to monitor the following metrics at
the broker: log flush time/rate, produce/fetch requests/messages rate, GC
rate/time, network bandwidth utilization,  and disk space and I/O
utilization. For the clients, it would be good to monitor message
size/rate, request time/rate, dropped event rate (for async producers) and
consumption lag (for consumers). For ZK, ideally, one should monitor ZK
request latency and GCs.



On Fri, Dec 28, 2012 at 7:27 AM, S Ahmed <[EMAIL PROTECTED]> wrote:

> Curious what kind of uptime have you guys experienced using kafka?
> What sort of monitoring do you suggest should be in place for kafka?
> If the service crashes, does it usually make sense to have something like
> upstart restart the service?
> There are allot of moving parts (hard drive space, zooker, producers,
> consumers, etc.)
> Also if the consumers can't keep up with new messages...