|
|
-
anecdotal uptime and service monitoring
S Ahmed 2012-12-28, 15:28
Curious what kind of uptime have you guys experienced using kafka?
What sort of monitoring do you suggest should be in place for kafka?
If the service crashes, does it usually make sense to have something like upstart restart the service?
There are allot of moving parts (hard drive space, zooker, producers, consumers, etc.)
Also if the consumers can't keep up with new messages...
+
S Ahmed 2012-12-28, 15:28
-
Re: anecdotal uptime and service monitoring
Jun Rao 2012-12-28, 22:48
At LinkedIn, the most common failure of a Kafka broker is when we have to deploy new Kafka code/config. Otherwise, the broker can be up for a long time (e..g, months). It woud be good to monitor the following metrics at the broker: log flush time/rate, produce/fetch requests/messages rate, GC rate/time, network bandwidth utilization, and disk space and I/O utilization. For the clients, it would be good to monitor message size/rate, request time/rate, dropped event rate (for async producers) and consumption lag (for consumers). For ZK, ideally, one should monitor ZK request latency and GCs.
Thanks,
Jun
On Fri, Dec 28, 2012 at 7:27 AM, S Ahmed <[EMAIL PROTECTED]> wrote:
> Curious what kind of uptime have you guys experienced using kafka? > > What sort of monitoring do you suggest should be in place for kafka? > > If the service crashes, does it usually make sense to have something like > upstart restart the service? > > There are allot of moving parts (hard drive space, zooker, producers, > consumers, etc.) > > Also if the consumers can't keep up with new messages... >
+
Jun Rao 2012-12-28, 22:48
-
Re: anecdotal uptime and service monitoring
S Ahmed 2013-01-30, 01:39
Jun,
Great list. I'm haven't really setup monitoring before, so for starters, what should I be researching in order to monitor those metrics, are they exposed via those yammer metrics library that can be exported to a csv file, or are these jmx related items? On Fri, Dec 28, 2012 at 5:47 PM, Jun Rao <[EMAIL PROTECTED]> wrote:
> At LinkedIn, the most common failure of a Kafka broker is when we have to > deploy new Kafka code/config. Otherwise, the broker can be up for a long > time (e..g, months). It woud be good to monitor the following metrics at > the broker: log flush time/rate, produce/fetch requests/messages rate, GC > rate/time, network bandwidth utilization, and disk space and I/O > utilization. For the clients, it would be good to monitor message > size/rate, request time/rate, dropped event rate (for async producers) and > consumption lag (for consumers). For ZK, ideally, one should monitor ZK > request latency and GCs. > > Thanks, > > Jun > > On Fri, Dec 28, 2012 at 7:27 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > > > Curious what kind of uptime have you guys experienced using kafka? > > > > What sort of monitoring do you suggest should be in place for kafka? > > > > If the service crashes, does it usually make sense to have something like > > upstart restart the service? > > > > There are allot of moving parts (hard drive space, zooker, producers, > > consumers, etc.) > > > > Also if the consumers can't keep up with new messages... > > >
+
S Ahmed 2013-01-30, 01:39
-
Re: anecdotal uptime and service monitoring
Jun Rao 2013-01-30, 04:56
In 0.8, we use the metrics package to do the jmx beans and it supports a csv reporter.
Thanks,
Jun
On Tue, Jan 29, 2013 at 5:38 PM, S Ahmed <[EMAIL PROTECTED]> wrote:
> Jun, > > Great list. I'm haven't really setup monitoring before, so for starters, > what should I be researching in order to monitor those metrics, are they > exposed via those yammer metrics library that can be exported to a csv > file, or are these jmx related items? > > > > > On Fri, Dec 28, 2012 at 5:47 PM, Jun Rao <[EMAIL PROTECTED]> wrote: > > > At LinkedIn, the most common failure of a Kafka broker is when we have to > > deploy new Kafka code/config. Otherwise, the broker can be up for a long > > time (e..g, months). It woud be good to monitor the following metrics at > > the broker: log flush time/rate, produce/fetch requests/messages rate, GC > > rate/time, network bandwidth utilization, and disk space and I/O > > utilization. For the clients, it would be good to monitor message > > size/rate, request time/rate, dropped event rate (for async producers) > and > > consumption lag (for consumers). For ZK, ideally, one should monitor ZK > > request latency and GCs. > > > > Thanks, > > > > Jun > > > > On Fri, Dec 28, 2012 at 7:27 AM, S Ahmed <[EMAIL PROTECTED]> wrote: > > > > > Curious what kind of uptime have you guys experienced using kafka? > > > > > > What sort of monitoring do you suggest should be in place for kafka? > > > > > > If the service crashes, does it usually make sense to have something > like > > > upstart restart the service? > > > > > > There are allot of moving parts (hard drive space, zooker, producers, > > > consumers, etc.) > > > > > > Also if the consumers can't keep up with new messages... > > > > > >
+
Jun Rao 2013-01-30, 04:56
|
|