-Re: Metrics in new producer
Jay Kreps 2014-02-06, 21:51
Also, here is the javadoc for this package:
On Thu, Feb 6, 2014 at 12:51 PM, Jay Kreps <[EMAIL PROTECTED]> wrote:
> Hey guys,
> I wanted to kick off a quick discussion of metrics with respect to the new
> producer and consumer (and potentially the server).
> At a high level I think there are three approaches we could take:
> 1. Plain vanilla JMX
> 2. Use Coda Hale (AKA Yammer) Metrics
> 3. Do our own metrics (with JMX as one output)
> 1. Has the advantage that JMX is the most commonly used java thing and
> plugs in reasonably to most metrics systems. JMX is included in the JDK so
> it doesn't impose any additional dependencies on clients. It has the
> disadvantage that plain vanilla JMX is a pain to use. We would need a bunch
> of helper code for maintaining counters to make this reasonable.
> 2. Coda Hale metrics is pretty good and broadly used. It supports JMX
> output as well as direct output to many other types of systems. The primary
> downside we have had with Coda Hale has to do with the clients and library
> incompatibilities. We are currently on an older more popular version. The
> newer version is a rewrite of the APIs and is incompatible. Originally
> these were totally incompatible and people had to choose one or the other.
> I think that has been improved so now the new version is a totally
> different package. But even in this case you end up with both versions if
> you use Kafka and we are on a different version than you which is going to
> be pretty inconvenient.
> 3. Doing our own has the downside of potentially reinventing the wheel,
> and potentially needing to work out any bugs in our code. The upsides would
> depend on the how good the reinvention was. As it happens I did a quick
> (~900 loc) version of a metrics library that is under kafka.common.metrics.
> I think it has some advantages over the Yammer metrics package for our
> usage beyond just not causing incompatibilities. I will describe this code
> so we can discuss the pros and cons. Although I favor this approach I have
> no emotional attachment and wouldn't be too sad if I ended up deleting it.
> Here are javadocs for this code, though I haven't written much
> documentation yet since I might end up deleting it:
> Here is a quick overview of this library.
> There are three main public interfaces:
> Metrics - This is a repository of metrics being tracked.
> Metric - A single, named numerical value being measured (i.e. a counter).
> Sensor - This is a thing that records values and updates zero or more
> So let's say we want to track three values about message sizes;
> specifically say we want to record the average, the maximum, the total rate
> of bytes being sent, and a count of messages. Then we would do something
> like this:
> // setup code
> Metrics metrics = new Metrics(); // this is a global "singleton"
> Sensor sensor = metrics.sensor("kafka.producer.message.sizes");
> sensor.add("kafka.producer.message-size.avg", new Avg());
> sensor.add("kafka.producer.message-size.max", new Max());
> sensor.add("kafka.producer.bytes-sent-per-sec", new Rate());
> sensor.add("kafka.producer.message-count", new Count());
> // now when we get a message we do this
> The above code creates the global metrics repository, creates a single
> Sensor, and defines 5 named metrics that are updated by that Sensor.
> Like Yammer Metrics (YM) I allow you to plug in "reporters", including a
> JMX reporter. Unlike the Coda Hale JMX reporter the reporter I have keys
> off the metric names not the Sensor names, which I think is an
> improvement--I just use the convention that the last portion of the name is
> the attribute name, the second to last is the mbean name, and the rest is
> the package. So in the above example there is a producer mbean that has a
> avg and max attribute and a producer mbean that has a bytes-sent-per-sec