Home | About | Sematext search-lucene.com search-hadoop.com
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo >> mail # dev >> GSOC: Monitor Improvements


+
Supun Kamburugamuva 2013-04-21, 14:37
+
Josh Elser 2013-04-21, 17:45
+
Supun Kamburugamuva 2013-04-21, 18:15
+
Josh Elser 2013-04-22, 02:00
+
Eric Newton 2013-04-22, 13:05
+
Mike Drob 2013-04-22, 15:42
+
Keith Turner 2013-04-22, 16:02
Copy link to this message
-
Re: GSOC: Monitor Improvements
Great.. we could certainly introduce the graph Mike and Keith have
mentioned.

Supun..
On Mon, Apr 22, 2013 at 12:02 PM, Keith Turner <[EMAIL PROTECTED]> wrote:

> On Mon, Apr 22, 2013 at 11:42 AM, Mike Drob <[EMAIL PROTECTED]> wrote:
>
> > Adding on to the comment about summaries, averages, and outliers. If, for
> > some reason, you end up with a two-hump population, then simply showing
> > averages will mask the split and lose a lot of valuable information. It
> is
> > often valuable to know that a particular set of users or servers are
> > experiencing degraded performance while the rest of the ecosystem is
> > healthy.
> >
> > This isn't something that shows up in a regular time series because the
> > secondary population is usually very small compared to the total
> > population. There was a graph for request latency of a service that I saw
> > once that I really wish I could find again, maybe somebody on the list
> will
> > be able to chime in - It had timestamps on the x-axis, latency on the y,
> > and each (x,y) point was colored on a gradient representing how many
> > requests were fulfilled at time x with latency y. This chart make it
> > immediately easy to see that most data points fit a normal distribution
> > with a low mean, but there was also a cluster at the top for some reason.
> >
>
>
> That sounds really cool.  Maybe the y-axis/latency could be log scale.
> Inevitably a 3004 second operation will finish and obscure the
> smaller latencies.
>
> Sometimes its more useful to sample this type of info from the clients
> rather than tablet servers.   A tablet server may report low latencies, but
> all clients using may experience high latencies because of a network issue.
>   We could certainly consider making the client code report this info.
>
>
> >
> > I'd love to see that type of chart show up for tablet servers (probably
> not
> > as useful for tables).
> >
> > Mike
> >
> >
> > On Mon, Apr 22, 2013 at 9:05 AM, Eric Newton <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Another thing to consider is scale.  On large clusters (many hundreds
> of
> > > nodes), more data is not helpful for visualization.  Instead,
> summaries,
> > > averages and outliers are important.
> > >
> > > For example, if one node is consistently slow, it is better to know
> that
> > > than to see one graph with low numbers in a sea of graphs.
> > >
> > > If the monitor collects information using JMX, collection time for each
> > > node would be a good thing to know, too.
> > >
> > > -Eric
> > >
> > >
> > > On Sun, Apr 21, 2013 at 10:00 PM, Josh Elser <[EMAIL PROTECTED]>
> > wrote:
> > >
> > > > Supun,
> > > >
> > > > Yup, very much so. Having a way to consume any and all metrics via
> JMX
> > > > would simplify things for any consumers (internal or external).
> > > >
> > > >
> > > >
> > > > On 04/21/2013 02:15 PM, Supun Kamburugamuva wrote:
> > > >
> > > >> Hi Josh,
> > > >>
> > > >> Thanks for the suggestions. I'll incorporate these to the proposal.
> > > >>
> > > >> Another area I would like to work is on JMX. There is a Jira that
> says
> > > to
> > > >> replace the Monitor calls from Thrift to JMX (Accumulo 694). Do you
> > > think
> > > >> this is a good addition to the Monitor?
> > > >>
> > > >> Thanks,
> > > >> Supun..
> > > >>
> > > >>
> > > >> On Sun, Apr 21, 2013 at 1:45 PM, Josh Elser <[EMAIL PROTECTED]>
> > > wrote:
> > > >>
> > > >>  Supun,
> > > >>>
> > > >>> Looks good! Can I make some suggestions/comments?
> > > >>>
> > > >>> For: "Per table plots: ACCUMULO-594", I'd also like to see minor
> > > >>> compactions, major compactions, index cache hit rate, and data
> cache
> > > hit
> > > >>> rate per table (same graphs that are displayed system-wide when you
> > > visit
> > > >>> http://${MONITOR_HOST}:50095/.
> > > >>>
> > > >>> For "Per tablet [server] plots", it would be neat if you could also
> > > >>> extract some general statistics like top N least performing, top N
> > > >>> highest
> > > >>> performing, etc. tablet servers. Ideally, this could correlate with

Supun Kamburugamuva
Member, Apache Software Foundation; http://www.apache.org
E-mail: [EMAIL PROTECTED];  Mobile: +1 812 369 6762
Blog: http://supunk.blogspot.com
+
Keith Turner 2013-04-22, 17:50
+
Miguel Pereira 2013-07-24, 14:02
+
David Medinets 2013-04-22, 14:41
+
Eric Newton 2013-04-22, 14:43
+
Josh Elser 2013-04-22, 14:57
+
Supun Kamburugamuva 2013-04-22, 15:04
+
Eric Newton 2013-04-22, 15:17
+
Supun Kamburugamuva 2013-04-22, 15:27
+
Supun Kamburugamuva 2013-04-22, 14:33
+
Eric Newton 2013-04-22, 15:03
+
Gabe Bell 2013-04-22, 15:09
NEW: Monitor These Apps!
elasticsearch, apache solr, apache hbase, hadoop, redis, casssandra, amazon cloudwatch, mysql, memcached, apache kafka, apache zookeeper, apache storm, ubuntu, centOS, red hat, debian, puppet labs, java, senseiDB