|
Jonathan Creasy
2012-07-28, 01:00
Jay Kreps
2012-07-28, 03:32
Pierre-Yves Ritschard
2012-07-28, 09:07
Jonathan Creasy
2012-07-30, 23:25
Jonathan Creasy
2012-07-31, 01:23
|
-
monitoring kafkaJonathan Creasy 2012-07-28, 01:00
How do you guys monitor Kafka? Do any of you have Nagios checks that you
use? What metrics do you find important?
-
Re: monitoring kafkaJay Kreps 2012-07-28, 03:32
LinkedIn has a custom monitoring system partially described here:
http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection The integration from the kafka side is basically just jmx, though we have a few wrappers that expose additional things. We measure basic stuff like disk stats, messages/sec, latency, etc. In addition we due a very kafka specific kind of monitoring we call "audit". This counts the number of messages sent by every producer, received by every broker, and received by every consumer and reconciles and graphs and alerts on these counts. This is very helpful in determining that all the sent data arrived at its destination. There is a bug open to open source this piece, though it has a few dependencies. https://issues.apache.org/jira/browse/KAFKA-260 -Jay On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote: > How do you guys monitor Kafka? Do any of you have Nagios checks that you > use? What metrics do you find important? >
-
Re: monitoring kafkaPierre-Yves Ritschard 2012-07-28, 09:07
I use the standard checks check that the process is running. A check
in zookeeper that checks for correct partition ownage and number of registered brokers / consumers /producers. Collectd runs on all my machines and pushes out jmx metrics out to graphite. I then use check-graphite which allows checking for consumer lag. On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > LinkedIn has a custom monitoring system partially described here: > http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection > > The integration from the kafka side is basically just jmx, though we have a > few wrappers that expose additional things. We measure basic stuff like > disk stats, messages/sec, latency, etc. > > In addition we due a very kafka specific kind of monitoring we call > "audit". This counts the number of messages sent by every producer, > received by every broker, and received by every consumer and reconciles and > graphs and alerts on these counts. This is very helpful in determining that > all the sent data arrived at its destination. There is a bug open to open > source this piece, though it has a few dependencies. > > https://issues.apache.org/jira/browse/KAFKA-260 > > -Jay > > On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote: > >> How do you guys monitor Kafka? Do any of you have Nagios checks that you >> use? What metrics do you find important? >>
-
Re: monitoring kafkaJonathan Creasy 2012-07-30, 23:25
Checking out the audit cdoe, the patch in KAFKA-260 doesn't apply for me,
there is a problem in core/src/main/scala/kafka/consumer/ConsumerIterator.scala. I am working with the branch 0.7.1. The section now looks like: val item = localCurrent.next() consumedOffset = item.offset new MessageAndMetadata(decoder.toEvent(item.message), currentTopicInfo.topic) Should I change decoder.toEvent(item.message) to decoder.fromMessage(item.message)? *************** *** 80,86 **** } val item = localCurrent.next() consumedOffset = item.offset - decoder.toEvent(item.message) } def clearCurrentChunk() = { --- 80,86 ---- } val item = localCurrent.next() consumedOffset = item.offset + decoder.fromMessage(item.message) } def clearCurrentChunk() = { On Sat, Jul 28, 2012 at 2:07 AM, Pierre-Yves Ritschard <[EMAIL PROTECTED]>wrote: > I use the standard checks check that the process is running. A check > in zookeeper that checks for correct partition ownage and number of > registered brokers / consumers /producers. > Collectd runs on all my machines and pushes out jmx metrics out to > graphite. I then use check-graphite which allows checking for consumer > lag. > > On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: > > LinkedIn has a custom monitoring system partially described here: > > > http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection > > > > The integration from the kafka side is basically just jmx, though we > have a > > few wrappers that expose additional things. We measure basic stuff like > > disk stats, messages/sec, latency, etc. > > > > In addition we due a very kafka specific kind of monitoring we call > > "audit". This counts the number of messages sent by every producer, > > received by every broker, and received by every consumer and reconciles > and > > graphs and alerts on these counts. This is very helpful in determining > that > > all the sent data arrived at its destination. There is a bug open to open > > source this piece, though it has a few dependencies. > > > > https://issues.apache.org/jira/browse/KAFKA-260 > > > > -Jay > > > > On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <[EMAIL PROTECTED]> > wrote: > > > >> How do you guys monitor Kafka? Do any of you have Nagios checks that you > >> use? What metrics do you find important? > >> >
-
Re: monitoring kafkaJonathan Creasy 2012-07-31, 01:23
nevermind.
On Mon, Jul 30, 2012 at 4:25 PM, Jonathan Creasy <[EMAIL PROTECTED]> wrote: > Checking out the audit cdoe, the patch in KAFKA-260 doesn't apply for me, > there is a problem > in core/src/main/scala/kafka/consumer/ConsumerIterator.scala. > > I am working with the branch 0.7.1. > > The section now looks like: > > val item = localCurrent.next() > consumedOffset = item.offset > new MessageAndMetadata(decoder.toEvent(item.message), > currentTopicInfo.topic) > > Should I change decoder.toEvent(item.message) to > decoder.fromMessage(item.message)? > > > *************** > *** 80,86 **** > } > val item = localCurrent.next() > consumedOffset = item.offset > - decoder.toEvent(item.message) > } > > def clearCurrentChunk() = { > --- 80,86 ---- > } > val item = localCurrent.next() > consumedOffset = item.offset > + decoder.fromMessage(item.message) > } > > def clearCurrentChunk() = { > > On Sat, Jul 28, 2012 at 2:07 AM, Pierre-Yves Ritschard <[EMAIL PROTECTED]>wrote: > >> I use the standard checks check that the process is running. A check >> in zookeeper that checks for correct partition ownage and number of >> registered brokers / consumers /producers. >> Collectd runs on all my machines and pushes out jmx metrics out to >> graphite. I then use check-graphite which allows checking for consumer >> lag. >> >> On Sat, Jul 28, 2012 at 5:32 AM, Jay Kreps <[EMAIL PROTECTED]> wrote: >> > LinkedIn has a custom monitoring system partially described here: >> > >> http://engineering.linkedin.com/52/autometrics-self-service-metrics-collection >> > >> > The integration from the kafka side is basically just jmx, though we >> have a >> > few wrappers that expose additional things. We measure basic stuff like >> > disk stats, messages/sec, latency, etc. >> > >> > In addition we due a very kafka specific kind of monitoring we call >> > "audit". This counts the number of messages sent by every producer, >> > received by every broker, and received by every consumer and reconciles >> and >> > graphs and alerts on these counts. This is very helpful in determining >> that >> > all the sent data arrived at its destination. There is a bug open to >> open >> > source this piece, though it has a few dependencies. >> > >> > https://issues.apache.org/jira/browse/KAFKA-260 >> > >> > -Jay >> > >> > On Fri, Jul 27, 2012 at 6:00 PM, Jonathan Creasy <[EMAIL PROTECTED]> >> wrote: >> > >> >> How do you guys monitor Kafka? Do any of you have Nagios checks that >> you >> >> use? What metrics do you find important? >> >> >> > > |