Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Threaded View
Kafka >> mail # user >> Kafka Monitoring


Copy link to this message
-
Re: Kafka Monitoring
Updated the doc at http://kafka.apache.org/documentation.html#monitoring

Hopefully that answers your questions.

Thanks,

Jun
On Tue, Sep 3, 2013 at 11:16 PM, Vadim Keylis <[EMAIL PROTECTED]> wrote:

> Good evening. I have read through section of monitoring. I tried to map
> each section to corresponding JMX attribute. I will appreciate if you
> answer a few questions bellow.
>
> Thanks so much in advance,
> Vadim
>
>     What this JMX
> "kafka.controller":type="KafkaController",name="ActiveControllerCount" for?
>
>     The rate of data in and out of the cluster and the number of messages
> written
>    Which jmx attributes should I monitor? Since I should alert on this What
> are acceptable changes? What are not?
>     The log flush rate and the time taken to flush the log
>     "kafka.log":type="LogFlushStats",name="LogFlushRateAndTimeMs"
> Which attribute I should be watching and what acceptable deviation change
> before I should alert
>     The number of partitions that have replicas that are down or have
> fallen behind and are underreplicated.
>    Is this the JMX
> "kafka.cluster":type="Partition",name="buypets-0-UnderReplicated" that will
> show replicas that are down?
>
>     Unclean leader elections. This shouldn't happen.
>
>
>  "kafka.controller":type="ControllerStats",name="UncleanLeaderElectionsPerSec".
> I assume that should always be 0 and if its not 0 we have problem.
>     Number of partitions each node is the leader for.
>    Which JMX attribute(s) monitors this?
>     Leader elections: we track each time this happens and how long it took:
>
>
> "kafka.controller":type="ControllerStats",name="LeaderElectionRateAndTimeMs"
>     Any changes to the ISR
>     Which JMX attribute I should monitor for this? Should I alert on this?
> What are reasonable changes? Which are not?
>     The number of produce requests waiting on replication to report back
>    Which JMX attribute I should monitor for this? Should I alert on this?
> What are reasonable changes? Which are not?
>     The number of fetch requests waiting on data to arrive
>    Which JMX attribute I should monitor for this? Should I alert on this?
> What are reasonable changes? Which are not?
>