Home | About | Sematext search-lucene.com search-hadoop.com
 Search Hadoop and all its subprojects:

Switch to Plain View
Accumulo, mail # dev - GSOC: Monitor Improvements


+
Supun Kamburugamuva 2013-04-21, 14:37
+
Josh Elser 2013-04-21, 17:45
+
Supun Kamburugamuva 2013-04-21, 18:15
+
Josh Elser 2013-04-22, 02:00
+
Eric Newton 2013-04-22, 13:05
+
Mike Drob 2013-04-22, 15:42
Copy link to this message
-
Re: GSOC: Monitor Improvements
Keith Turner 2013-04-22, 16:02
On Mon, Apr 22, 2013 at 11:42 AM, Mike Drob <[EMAIL PROTECTED]> wrote:

> Adding on to the comment about summaries, averages, and outliers. If, for
> some reason, you end up with a two-hump population, then simply showing
> averages will mask the split and lose a lot of valuable information. It is
> often valuable to know that a particular set of users or servers are
> experiencing degraded performance while the rest of the ecosystem is
> healthy.
>
> This isn't something that shows up in a regular time series because the
> secondary population is usually very small compared to the total
> population. There was a graph for request latency of a service that I saw
> once that I really wish I could find again, maybe somebody on the list will
> be able to chime in - It had timestamps on the x-axis, latency on the y,
> and each (x,y) point was colored on a gradient representing how many
> requests were fulfilled at time x with latency y. This chart make it
> immediately easy to see that most data points fit a normal distribution
> with a low mean, but there was also a cluster at the top for some reason.
>
That sounds really cool.  Maybe the y-axis/latency could be log scale.
Inevitably a 3004 second operation will finish and obscure the
smaller latencies.

Sometimes its more useful to sample this type of info from the clients
rather than tablet servers.   A tablet server may report low latencies, but
all clients using may experience high latencies because of a network issue.
  We could certainly consider making the client code report this info.
>
> I'd love to see that type of chart show up for tablet servers (probably not
> as useful for tables).
>
> Mike
>
>
> On Mon, Apr 22, 2013 at 9:05 AM, Eric Newton <[EMAIL PROTECTED]>
> wrote:
>
> > Another thing to consider is scale.  On large clusters (many hundreds of
> > nodes), more data is not helpful for visualization.  Instead, summaries,
> > averages and outliers are important.
> >
> > For example, if one node is consistently slow, it is better to know that
> > than to see one graph with low numbers in a sea of graphs.
> >
> > If the monitor collects information using JMX, collection time for each
> > node would be a good thing to know, too.
> >
> > -Eric
> >
> >
> > On Sun, Apr 21, 2013 at 10:00 PM, Josh Elser <[EMAIL PROTECTED]>
> wrote:
> >
> > > Supun,
> > >
> > > Yup, very much so. Having a way to consume any and all metrics via JMX
> > > would simplify things for any consumers (internal or external).
> > >
> > >
> > >
> > > On 04/21/2013 02:15 PM, Supun Kamburugamuva wrote:
> > >
> > >> Hi Josh,
> > >>
> > >> Thanks for the suggestions. I'll incorporate these to the proposal.
> > >>
> > >> Another area I would like to work is on JMX. There is a Jira that says
> > to
> > >> replace the Monitor calls from Thrift to JMX (Accumulo 694). Do you
> > think
> > >> this is a good addition to the Monitor?
> > >>
> > >> Thanks,
> > >> Supun..
> > >>
> > >>
> > >> On Sun, Apr 21, 2013 at 1:45 PM, Josh Elser <[EMAIL PROTECTED]>
> > wrote:
> > >>
> > >>  Supun,
> > >>>
> > >>> Looks good! Can I make some suggestions/comments?
> > >>>
> > >>> For: "Per table plots: ACCUMULO-594", I'd also like to see minor
> > >>> compactions, major compactions, index cache hit rate, and data cache
> > hit
> > >>> rate per table (same graphs that are displayed system-wide when you
> > visit
> > >>> http://${MONITOR_HOST}:50095/.
> > >>>
> > >>> For "Per tablet [server] plots", it would be neat if you could also
> > >>> extract some general statistics like top N least performing, top N
> > >>> highest
> > >>> performing, etc. tablet servers. Ideally, this could correlate with
> > >>> servers
> > >>> that may be having problems :).
> > >>>
> > >>> Do you see these proposed changes as being sufficient for 3-4 months
> of
> > >>> 40hrs/week work? If you plan to really dig into these changes
> (perhaps
> > >>> reworking components of the monitor itself), I could perhaps see
> this.
> > Do
> > >>> you have any ideas for more lofty goals that you could pursue as
+
Supun Kamburugamuva 2013-04-22, 16:42
+
Keith Turner 2013-04-22, 17:50
+
Miguel Pereira 2013-07-24, 14:02
+
David Medinets 2013-04-22, 14:41
+
Eric Newton 2013-04-22, 14:43
+
Josh Elser 2013-04-22, 14:57
+
Supun Kamburugamuva 2013-04-22, 15:04
+
Eric Newton 2013-04-22, 15:17
+
Supun Kamburugamuva 2013-04-22, 15:27
+
Supun Kamburugamuva 2013-04-22, 14:33
+
Eric Newton 2013-04-22, 15:03
+
Gabe Bell 2013-04-22, 15:09